MATLAB-How can I randomly select smaller values with higher probabilities? - matlab

I have a column vector "distances", and I want to select a value randomly from this vector such that smaller values have a higher probability of being selected. So far I am using the following, where "possible_cells" is the randomly selected value:
w=(fliplr(1:numel(distances)))/100
possible_cells=randsample((sort(distances)),1,true,w)
Basically, I flipped the distance vector to create probabilities of selection "w" (if I am understanding randsample correctly), so that the smallest value has the probability of being selected equal to the highest value. To check how well this works, I randomly drew 50 values and by using a histogram, I see that the values are higher than I would expect. Does anyone have any idea on how else to do what I described above?
0 Comments

How about something like this?
let's start with 10 sample distances with lengths no greater than 20 just to demonstrate:
d = randi(20,10,1);
Next, since we want smaller values to be more likely, let's take the reciprocal of those distances:
d_rec = 1./d;
Now, let's normalize so we can create a distribution from which to select our distance:
d_rec_norm = d_rec ./ sum(d_rec);
This new variable reflects the probability with which to select each given distance. Now comes a little trick... we choose the distance like this:
d_i = find(rand < cumsum(d_rec_norm),1);
This will give us the index of our chosen distance. The logic behind this is that when cumulatively summing the normalized values associated with each distance (d_rec_norm) we create "bins" whose widths are proportional to the likelihood of selecting each distance. All that is left is to pick a random number between 0 and 1 (rand) and see which "bin" it falls in.
I'm a new poster here, so let me know if this is unclear and I can try to improve my explanation.

Related

Index when mean is constant

I am relatively new to matlab. I found the consecutive mean of a set of 1E6 random numbers that has mean and standard deviation. Initially the calculated mean fluctuate and then converges to a certain value.
I will like to know the index (i.e 100th position) at which the mean converges. I have no idea how to do that.
I tried using the logical operator but i have to go through 1e6 data points. Even with that i still can't find the index.
Y_c= sigma_c * randn(n_r, 1) + mu_c; %Random number creation
Y_f=sigma_f * randn(n_r, 1) + mu_f;%Random number creation
P_u=gamma*(B*B)/2.*N_gamma+q*B.*N_q + Y_c*B.*N_c; %Calculation of Ultimate load
prog_mu=cumsum(P_u)./cumsum(ones(size(P_u))); %Progressive Cumulative Mean of system response
logical(diff(prog_mu==0)); %Find index
I suspect the issue is that the mean will never truly be constant, but will rather fluctuate around the "true mean". As such, you'll most likely never encounter a situation where the two consecutive values of the cumulative mean are identical. What you should do is determine some threshold value, below which you consider fluctuations in the mean to be approximately equal to zero, and compare the difference of the cumulative mean to that value. For instance:
epsilon = 0.01;
const_ind = find(abs(diff(prog_mu))<epsilon,1,'first');
where epsilon will be the threshold value you choose. The find command will return the index at which the variation in the cumulative mean first drops below this threshold value.
EDIT: As was pointed out, this method may potentially fail if the first few random numbers are generated such that the difference between them is less than the epsilon value, but have not yet converged. I would like to suggest a different approach, then.
We calculate the cumulative means, as before, like so:
prog_mu=cumsum(P_u)./cumsum(ones(size(P_u)));
We also calculate the difference in these cumulative means, as before:
df_prog_mu = diff(prog_mu);
Now, to ensure that conversion has been achieved, we find the first index where the cumulative mean is below the threshold value epsilon and all subsequent means are also below the threshold value. To phrase this another way, we want to find the index after the last position in the array where the cumulative mean is above the threshold:
conv_index = find(~df_prog_mu,1,'last')+1;
In doing so, we guarantee that the value at the index, and all subsequent values, have converged below your predetermined threshold value.
I wouldn't imagine that the mean would suddenly become constant at a single index. Wouldn't it asymptotically approach a constant value? I would reccommend a for loop to calculate the mean (it sounds like maybe you've already done this part?) like this:
avg = [];
for k=1:length(x)
avg(k) = mean(x(1:k));
end
Then plot the consecutive mean:
plot(avg)
hold on % this will allow us to plot more data on the same figure later
If you're trying to find the point at which the consecutive mean comes within a certain range of the true mean, try this:
Tavg = 5; % or whatever your true mean is
err = 0.01; % the range you want the consecutive mean to reach before we say that it "became constant"
inRange = avg>(Tavg-err) & avg<(Tavg+err); % gives you a binary logical array telling you which values fell within the range
q = 1000; % set this as high as you can while still getting a value for consIndex
constIndex = [];
for k=1:length(inRange)
if(inRange(k) == sum(inRange(k:k+q))/(q-1);)
constIndex = k;
end
end
The below answer takes a similar approach but makes an unsafe assumption that the first value to fall within the range is the value where the function starts to converge. Any value could randomly fall within that range. We need to make sure that the following values also fall within that range. In the above code, you can edit "q" and "err" to optimize your result. I would recommend double checking it by plotting.
plot(avg(constIndex), '*')

How to change pixel values of an RGB image in MATLAB?

So what I need to do is to apply an operation like
(x(i,j)-min(x)) / max(x(i,j)-min(x))
which basically converts each pixel value such that the values range between 0 and 1.
First of all, I realised that Matlab saves our image(rows * col * colour) in a 3D matrix on using imread,
Image = imread('image.jpg')
So, a simple max operation on image doesn't give me the max value of pixel and I'm not quite sure what it returns(another multidimensional array?). So I tried using something like
max_pixel = max(max(max(Image)))
I thought it worked fine. Similarly I used min thrice. My logic was that I was getting the min pixel value across all 3 colour planes.
After performing the above scaling operation I got an image which seemed to have only 0 or 1 values and no value in between which doesn't seem right. Has it got something to do with integer/float rounding off?
image = imread('cat.jpg')
maxI = max(max(max(image)))
minI = min(min(min(image)))
new_image = ((I-minI)./max(I-minI))
This gives output of only 1s and 0s which doesn't seem correct.
The other approach I'm trying is working on all colour planes separately as done here. But is that the correct way to do it?
I could also loop through all pixels but I'm assuming that will be time taking. Very new to this, any help will be great.
If you are not sure what a matlab functions returns or why, you should always do one of the following first:
Type help >functionName< or doc >functionName< in the command window, in your case: doc max. This will show you the essential must-know information of that specific function, such as what needs to be put in, and what will be output.
In the case of the max function, this yields the following results:
M = max(A) returns the maximum elements of an array.
If A is a vector, then max(A) returns the maximum of A.
If A is a matrix, then max(A) is a row vector containing the maximum
value of each column.
If A is a multidimensional array, then max(A) operates along the first
array dimension whose size does not equal 1, treating the elements as
vectors. The size of this dimension becomes 1 while the sizes of all
other dimensions remain the same. If A is an empty array whose first
dimension has zero length, then max(A) returns an empty array with the
same size as A
In other words, if you use max() on a matrix, it will output a vector that contains the maximum value of each column (the first non-singleton dimension). If you use max() on a matrix A of size m x n x 3, it will result in a matrix of maximum values of size 1 x n x 3. So this answers your question:
I'm not quite sure what it returns(another multidimensional array?)
Moving on:
I thought it worked fine. Similarly I used min thrice. My logic was that I was getting the min pixel value across all 3 colour planes.
This is correct. Alternatively, you can use max(A(:)) and min(A(:)), which is equivalent if you are just looking for the value.
And after performing the above operation I got an image which seemed to have only 0 or 1 values and no value in between which doesn't seem right. Has it got something to do with integer/float rounding off?
There is no way for us to know why this happens if you do not post a minimal, complete and verifiable example of your code. It could be that it is because your variables are of a certain type, or it could be because of an error in your calculations.
The other approach I'm trying is working on all colour planes separately as done here. But is that the correct way to do it?
This depends on what the intended end result is. Normalizing each colour (red, green, blue) seperately will result in a different result as compared to normalizing the values all at once (in 99% of cases, anyway).
You have a uint8 RGB image.
Just convert it to a double image by
I=imread('https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Cat_poster_1.jpg/1920px-Cat_poster_1.jpg')
I=double(I)./255;
alternatively
I=im2double(I); %does the scaling if needed
Read about image data types
What are you doing wrong?
If what you want todo is convert a RGB image to [0-1] range, you are approaching the problem badly, regardless of the correctness of your MATLAB code. Let me give you an example of why:
Say you have an image with 2 colors.
A dark red (20,0,0):
A medium blue (0,0,128)
Now you want this changed to [0-1]. How do you scale it? Your suggested approach is to make the value 128->1 and either 20->20/128 or 20->1 (not relevant). However when you do this, you are changing the color! you are making the medium blue to be intense blue (maximum B channel, ) and making R way way more intense (instead of 20/255, 20/128, double brightness! ). This is bad, as this is with simple colors, but with combined RGB values you may even change the color itsef, not only the intensity. Therefore, the only correct way to convert to [0-1] range is to assume your min and max are [0, 255].

How to calculate the "rest value" of a plot?

Didn't know how to paraphrase the question well.
Function for example:
Data:https://www.dropbox.com/s/wr61qyhhf6ujvny/data.mat?dl=0
In this case how do I calculate that the rest point of this function is ~1? I have access to the vector that makes the plot.
I guess the mean is an approximation but in some cases it can be pretty bad.
Under the assumption that the "rest" point is the steady-state value in your data and the fact that the steady-state value happens the majority of the times in your data, you can simply bin all of the points and use each unique value as a separate bin. The bin with the highest count should correspond to the steady-state value.
You can do this by a combination of histc and unique. Assuming your data is stored in y, do this:
%// Find all unique values in your data
bins = unique(y);
%// Find the total number of occurrences per unique value
counts = histc(y, bins);
%// Figure out which bin has the largest count
[~,max_bin] = max(counts);
%// Figure out the corresponding y value
ss_value = bins(max_bin);
ss_value contains the steady-state value of your data, corresponding to the most occurring output point with the assumptions I laid out above.
A minor caveat with the above approach is that this is not friendly to floating point data whose unique values are generated by floating point values whose decimal values beyond the first few significant digits are different.
Here's an example of your data from point 2300 to 2320:
>> format long g;
>> y(2300:2320)
ans =
0.99995724232555
0.999957488454868
0.999957733165346
0.999957976465197
0.999958218362579
0.999958458865564
0.999958697982251
0.999958935720613
0.999959172088623
0.999959407094224
0.999959640745246
0.999959873049548
0.999960104014889
0.999960333649014
0.999960561959611
0.999960788954326
0.99996101464076
0.999961239026462
0.999961462118947
0.999961683925704
0.999961904454139
Therefore, what I'd recommend is to perhaps round so that the first 5 or so significant digits are maintained.
You can do this to your dataset before you continue:
num_digits = 5;
y_round = round(y*(10^num_digits))/(10^num_digits);
This will first multiply by 10^n where n is the number of digits you desire so that the decimal point is shifted over by n positions. We round this result, then divide by 10^n to bring it back to the scale that it was before. If you do this, for those points that were 0.9999... where there are n decimal places, these will get rounded to 1, and it may help in the above calculations.
However, more recent versions of MATLAB have this functionality already built-in to round, and you can just do this:
num_digits = 5;
y_round = round(y,num_digits);
Minor Note
More recent versions of MATLAB discourage the use of histc and recommend you use histcounts instead. Same function definition and expected inputs and outputs... so just replace histc with histcounts if your MATLAB version can handle it.
Using the above logic, you could also use the median too. If the majority of data is fluctuating around 1, then the median would have a high probability that the steady-state value is chosen... so try this too:
ss_value = median(y_round);

Creating image profiles in some parts of the image

I've been struggling with a problem for a while:) in Matlab.
I have an image (A.tif) in which I would like to find maxima (with defined threshold) but more specific coordinates of these maxima. My goal is to create short profiles on the image crossing these maxima (let say +- 20 pixels on both sides of the maximum)
I tried this:
[r c]=find(A==max(max(A)));
I suppose that r and c are coordinates of maximum (only one/first or every maximum?)
How can I implement these coordinates into ,for example improfile function?
I think it should be done using nested loops?
Thanks for every suggestion
Your code is working but it finds only global maximum coordinates.I would like to find multiple maxima (with defined threshold) and properly address its coordinates to create multiple profiles crossing every maximum found. I have little problem with improfile function :
improfile(IMAGE,[starting point],[ending point]) .
Lets say that I get [rows, columns] matrix with coordinates of each maximum and I'm trying to create one direction profile which starts in the same row where maximum is (about 20 pixels before max) and of course ends in the same row (also about 20 pixels from max) .
is this correct expression :improfile(IMAGE,[rows columns-20],[rows columns+20]); It plots something but it seems to only joins maxima rather than making intensity profiles
You're not giving enough information so I had to guess a few things. You should apply the max() to the vectorized image and store the index:
[~,idx] = max(I(:))
Then transform this into x and y coordinates:
[ix,iy] = ind2sub(size(I),idx)
This is your x and y of the maximum of the image. It really depends what profile section you want. Something like this is working:
I = imread('peppers.png');
Ir = I(:,:,1);
[~,idx]=max(Ir(:))
[ix,iy]=ind2sub(size(Ir),idx)
improfile(Ir,[0 ix],[iy iy])
EDIT:
If you want to instead find the k largest values and not just the maximum you can do an easy sort:
[~,idx] = sort(I(:),'descend');
idxk = idx(1:k);
[ix,iy] = ind2sub(size(I),idxk)
Please delete your "reply" and instead edit your original post where you define your problem better

Arbitrary distribution -> Uniform distribution (Probability Integral Transform?)

I have 500,000 values for a variable derived from financial markets. Specifically, this variable represents distance from the mean (in standard deviations). This variable has a arbitrary distribution. I need a formula that will allow me to select a range around any value of this variable such that an equal (or close to it) amount of data points fall within that range.
This will allow me to then analyze all of the data points within a specific range and to treat them as "similar situations to the input."
From what I understand, this means that I need to convert it from arbitrary distribution to uniform distribution. I have read (but barely understood) that what I am looking for is called "probability integral transform."
Can anyone assist me with some code (Matlab preferred, but it doesn't really matter) to help me accomplish this?
Here's something I put together quickly. It's not polished and not perfect, but it does what you want to do.
clear
randList=[randn(1e4,1);2*randn(1e4,1)+5];
[xCdf,xList]=ksdensity(randList,'npoints',5e3,'function','cdf');
xRange=getInterval(5,xList,xCdf,0.1);
and the function getInterval is
function out=getInterval(yPoint,xList,xCdf,areaFraction)
yCdf=interp1(xList,xCdf,yPoint);
yCdfRange=[-areaFraction/2, areaFraction/2]+yCdf;
out=interp1(xCdf,xList,yCdfRange);
Explanation:
The CDF of the random distribution is shown below by the line in blue. You provide a point (here 5 in the input to getInterval) about which you want a range that gives you 10% of the area (input 0.1 to getInterval). The chosen point is marked by the red cross and the
interval is marked by the lines in green. You can get the corresponding points from the original list that lie within this interval as
newList=randList(randList>=xRange(1) & randList<=xRange(2));
You'll find that on an average, the number of points in this example is ~2000, which is 10% of numel(randList)
numel(newList)
ans =
2045
NOTE:
Please note that this was done quickly and I haven't made any checks to see if the chosen point is outside the range or if yCdfRange falls outside [0 1], in which case interp1 will return a NaN. This is fairly straightforward to implement, and I'll leave that to you.
Also, ksdensity is very CPU intensive. I wouldn't recommend increasing npoints to more than 1e4. I assume you're only working with a fixed list (i.e., you have a list of 5e5 points that you've obtained somehow and now you're just running tests/analyzing it). In that case, you can run ksdensity once and save the result.
I do not speak Matlab, but you need to find quantiles in your data. This is Mathematica code which would do this:
In[88]:= data = RandomVariate[SkewNormalDistribution[0, 1, 2], 10^4];
Compute quantile points:
In[91]:= q10 = Quantile[data, Range[0, 10]/10];
Now form pairs of consecutive quantiles:
In[92]:= intervals = Partition[q10, 2, 1];
In[93]:= intervals
Out[93]= {{-1.397, -0.136989}, {-0.136989, 0.123689}, {0.123689,
0.312232}, {0.312232, 0.478551}, {0.478551, 0.652482}, {0.652482,
0.829642}, {0.829642, 1.02801}, {1.02801, 1.27609}, {1.27609,
1.6237}, {1.6237, 4.04219}}
Verify that the splitting points separate data nearly evenly:
In[94]:= Table[Count[data, x_ /; i[[1]] <= x < i[[2]]], {i, intervals}]
Out[94]= {999, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000}