Creating a cumulative distribution from a vector - matlab

I need to create a cumulative distribution from some numbers contained in a vector. The vector counts the number of times a dot product operation occurs in an algorithm I've been given.
An example vector would be
myVector = [100 102 101 99 98 100 101 110 102 101 100 99]
I'd like to plot the probability that I have fewer than 99 dot products, against a range from 0 to 120. The built in function
Cumdist(MyVector)
Isn't appropriate as I need to plot over a wider range than cumdist currently provides.
I've tried using
plot([0 N],cumsum(myVector))
but I have multiple entries which are the same value in my vector, and I can't work out how not to double count.
Here is some python code which does what I want:
count = [x[0] for x in tests]
found = [x[1] for x in tests]
found.sort()
num = Counter(found)
freqs = [x for x in num.values()]
cumsum = [sum(item for item in freqs[0:rank+1]) for rank in xrange(len(freqs))]
normcumsum = [float(x)/numtests for x in cumsum]
tests is a list of numbers representing the number of times a dot product was done.
Here is an example of what I'm looking for:
Example cumulative distribution

To create a cumulative distribution, you cannot use cumsum on the vector directly. Do the following instead:
sortedVector = sort(myVector(:));
indexOfValueChange = [find(diff(sortedVector));true];
relativeCounts = (1:length(sortedVector))/length(sortedVector);
plot(sortedVector(indexOfValueChange),relativeCounts(indexOfValueChange))
EDIT
If your goal is just to modify the x-range of your plot,
xlim([0 120])
should do what you need.

Five hours and an answer already accepted, but if you're still interested in another answer...
What you're trying to do is obtain the empirical CDF of your data. Matlab's Statistics Toolbox, which you likely have, has a function to do exactly this in a statistically careful manner: ecdf. So all you actually need to do is
myVector = [100 102 101 99 98 100 101 110 102 101 100 99];
[Y,X] = ecdf(myVector);
figure;
plot(X,Y);
You can use stairs instead of plot to display the true shape of the empirical distribution.

Here is how I would do it:
myVector = [100 102 101 99 98 100 101 110 102 101 100 99];
N = numel(myVector);
x = sort(myVector);
y = 1:N;
[xplot , idx] = unique(x,'last')
yplot = y(idx)/N
stairs(xplot,yplot)
%Optionally
xfull = [0 xplot 120]
yfull = [0 yplot 1]
stairs(xfull,yfull)

Related

Get points which are within a given distance in two different matrices

I have two matrices A and B, in which the number of rows can vary. A and B do not necessarily have the same number of rows.
For example:
A = [ 110 90
130 140
230 50
370 210 ];
B = [ 321 95
102 35
303 200 ];
Now matrix A and B have 'corresponding points'. Corresponding points are rows where the values in the 2nd column of both matrices are within +/-20.
For example:
A(1,2) = 90 and B(1,2) = 95, the difference is within +/-20 so A(1,:) and B(1,:) are corresponding points.
A(2,2) = 140 and B(2,2) = 35, the difference is not within +/-20 so A(2,:) and B(2,:) are not corresponding points.
A(3,2) = 50 and B(2,2) = 35, the difference is within +/-20 so A(3,:) and B(2,:) are corresponding points.
Using this I want to store the corresponding points of A and B in C and D respectively. For the above example, the final matrices should look like this:
C = [ 110 90
230 50
370 210 ]
D = [ 321 95
102 35
303 200 ]
You can get all of the distances using pdist2
dists = pdist2( A(:,2), B(:,2) )
>> dists = [ 5 55 110
45 105 60
45 15 150
115 175 10 ]
Then get the indices of all 'corresponding points', as defined by a threshold of 20.
% Get combinations within tolerance
idx = dists < 20;
% Get indices
[iA, iB] = find(idx);
Then you can create the final matrices
C = A(iA, :);
D = B(iB, :);
Edit: One way to ensure each pairing is unique (i.e. A(1,:) cannot be paired with multiple rows from B) would be to get the minimum dists for each row/column. Note: this would still give you duplicate matches if the distances are exactly the same, you haven't defined how this should be handled.
dists = pdist2( A(:,2), B(:,2) );
% Set values which are greater than the row/column minima to be infinity.
% This means they will never be within the tolerance of 20 (or whatever else)
dists ( bsxfun(#gt, dists, min(dists,[],1)) | bsxfun(#gt, dists, min(dists,[],2)) ) = Inf;
% In MATLAB versions > 2016b, you can use implicit expansion to replace bsxfun
% That would be: dists( dists > min(dists,[],1) | dists > min(dists,[],2) )
% Now continue as before
[iA, iB] = find( dists < 20 );
C = A(iA, :);
D = B(iB, :);

Matlab - finding the values in a vector making a neighborhood chain

I have a vector that has values, say a=[10 20 42 90] and what I am trying to do is to find the neighbors in the range of 30 and replace these values with their means. For example, for the a vector, the value of 20 is a neighbor of 10. Additionally, 42 is also a neighbor of 10 through 20, because it is a neighbor's neighbor but 90 is not a neighboring value and it is not reachable from 10 with a neighborhood size of 30.
So I want to replace all 10, 20 and 42 with their means and obtain the vector a=[24 90].
If a=[10 20 42 66 155], then the resulting vector would be a=[34.5 155].
How do I achieve that?
a=[10 20 42 66 155]; % sample data
r = 30; % sample range
a = accumarray(cumsum([r+1 abs(diff(a))]>r).',a,[],#mean).';
Ungolfed and commented version:
a=[10 20 42 66 155]; % sample data
r = 30; % range
% difference between subsequent groupmembers. First difference is set to be higher than r
d = [r+1 abs(diff(a))];
% each group one label
L = cumsum(d>r);
% calculate mean of each group
a = accumarray(L.',a,[],#mean).';

How to sample a plot in Matlab?

The plot in MATLAB looks like this:
The code to generate this is very simple:
y = [0 18 450];
x = [0 5.3 6.575];
plot(x,y);
How could I know the values of 119 equally spaced discrete points on this plot?
In simple MATLAB plots, the points are connected together by simple linear interpolation. Simply put, a straight line is drawn between each pair of points. You can't physically get these points from the graph other than those you used to plot the points (at least not easily...).
If you however do desire 119 points at equally spaced intervals that would theoretically be obtained from the above set of 4 points, you can use the interp1 function to do so:
y = [0 18 450];
x = [0 5.3 6.575]
yy = interp1(x, y, linspace(min(x),max(x),119), 'linear');
interp1 performs linear (note the 'linear' flag at the end...) interpolation given a set of key points defined by x and y points and a set of x points to use to interpolate between the key x points to generate the interpolated y points stored in yy. linspace in this case generates a linearly increasing array from the smallest value in x to the largest value in x with 119 of these points.
Here's a running example with your data:
>> format compact;
>> y = [0 18 450];
>> x = [0 5.3 6.575];
>> yy = interp1(x, y, linspace(min(x),max(x),119), 'linear');
>> yy
yy =
Columns 1 through 8
0 0.1892 0.3785 0.5677 0.7570 0.9462 1.1354 1.3247
Columns 9 through 16
1.5139 1.7031 1.8924 2.0816 2.2709 2.4601 2.6493 2.8386
Columns 17 through 24
3.0278 3.2171 3.4063 3.5955 3.7848 3.9740 4.1633 4.3525
Columns 25 through 32
4.5417 4.7310 4.9202 5.1094 5.2987 5.4879 5.6772 5.8664
Columns 33 through 40
6.0556 6.2449 6.4341 6.6234 6.8126 7.0018 7.1911 7.3803
Columns 41 through 48
7.5696 7.7588 7.9480 8.1373 8.3265 8.5157 8.7050 8.8942
Columns 49 through 56
9.0835 9.2727 9.4619 9.6512 9.8404 10.0297 10.2189 10.4081
Columns 57 through 64
10.5974 10.7866 10.9759 11.1651 11.3543 11.5436 11.7328 11.9220
Columns 65 through 72
12.1113 12.3005 12.4898 12.6790 12.8682 13.0575 13.2467 13.4360
Columns 73 through 80
13.6252 13.8144 14.0037 14.1929 14.3822 14.5714 14.7606 14.9499
Columns 81 through 88
15.1391 15.3283 15.5176 15.7068 15.8961 16.0853 16.2745 16.4638
Columns 89 through 96
16.6530 16.8423 17.0315 17.2207 17.4100 17.5992 17.7885 17.9777
Columns 97 through 104
34.6540 53.5334 72.4128 91.2921 110.1715 129.0508 147.9302 166.8096
Columns 105 through 112
185.6889 204.5683 223.4477 242.3270 261.2064 280.0857 298.9651 317.8445
Columns 113 through 119
336.7238 355.6032 374.4826 393.3619 412.2413 431.1206 450.0000

Initialization of population for genetic algorithm in matlab

I randomly generated initial 10 population(let each of size n) of genetic algorithm as follows
for i = 1:10
for j=1:n
population(i,j)=randi([MinIntensity,MaxIntensity]);
end
end
Assume that I have the values of one population.For example let the first population of size 10 is [100 110 120 130 140 150 160 170 180 190].Is it possible to generate the remaining 9 population such that the values are near to the first population?(It is for the quick convergence of genetic algorithm).Also each population is a grayscale image with intensity values represented in row major order.Hence the intensity values should be in the range of 0 - 255.
Please help.Thanks in advance
You can do one thing. Use the first string as it is for the rest of the 9 strings except randomly generate an index (between 1 to n) and assign a random integer only to that positions with that random index.
population(1,:) = [100 110 120 130 140 150 160 170 180 190];
for i = 2:10
idx = randi([1 10]);
population(i,:) = population(1,:);
population(i,idx) = randi([0 255]);
end
With this you will get ten strings differing in only one position.
Edit: Image.
Assuming you have a MXN image. Create a mask for example
randi([-10 10], M , N)
Now add this to your original image. Now you get a new image whose all the pixels are modified but only within the range of -10 to 10. Some of the pixel values might go out of range in that case just modify as below
x(find(x < 0)) = 0 %Here X is your new image.
x(find(x > 255)) = 255

Random sampling from gridded data: How to implement this in Matlab?

I have a 200x200 gridded data points. I want to randomly pick 15 grid points from that grid and replace the values in those grids with values selected from a known distribution shown below. All 15 grid points are assigned random values from the given distribution.
The given distribution is:
Given Distribution
314.52
1232.8
559.93
1541.4
264.2
1170.5
500.97
551.83
842.16
357.3
751.34
583.64
782.54
537.28
210.58
805.27
402.29
872.77
507.83
1595.1
The given distribution is made up from 20 values, which are part of those gridded data points. These 20 grid points are fixed i.e. they must not be part of randomly picking 15 points. The coordinates of these 20 points, which are fixed and should not be part of random picking, are:
x 27 180 154 183 124 146 16 184 138 122 192 39 194 129 115 33 47 65 1 93
y 182 81 52 24 168 11 90 153 133 79 183 25 63 107 161 14 65 2 124 79
Can someone help with how to implement this problem in Matlab?
Building off of my answer to your simpler question, here is a solution for how you can choose 15 random integer points (i.e. subscripted indices into your 200-by-200 matrix) and assign random values drawn from your set of values given above:
mat = [...]; %# Your 200-by-200 matrix
x = [...]; %# Your 20 x coordinates given above
y = [...]; %# Your 20 y coordinates given above
data = [...]; %# Your 20 data values given above
fixedPoints = [x(:) y(:)]; %# Your 20 points in one 20-by-2 matrix
randomPoints = randi(200,[15 2]); %# A 15-by-2 matrix of random integers
isRepeated = ismember(randomPoints,fixedPoints,'rows'); %# Find repeated sets of
%# coordinates
while any(isRepeated)
randomPoints(isRepeated,:) = randi(200,[sum(isRepeated) 2]); %# Create new
%# coordinates
isRepeated(isRepeated) = ismember(randomPoints(isRepeated,:),...
fixedPoints,'rows'); %# Check the new
%# coordinates
end
newValueIndex = randi(20,[1 15]); %# Select 15 random indices into data
linearIndex = sub2ind([200 200],randomPoints(:,1),...
randomPoints(:,2)); %# Get a linear index into mat
mat(linearIndex) = data(newValueIndex); %# Update the 15 points
In the above code I'm assuming that the x coordinates correspond to row indices and the y coordinates correspond to column indices into mat. If it's actually the other way around, swap the second and third inputs to the function SUB2IND.
I think yoda already gave the basic idea. Call randi twice to get the grid coordinate to replace, and then replace it with the appropriate value. Do that 15 times.