How do I generate random numbers in such range? [duplicate] - matlab

This question already has answers here:
Generate random number with given probability matlab
(7 answers)
Draw random numbers from pre-specified probability mass function in Matlab
(1 answer)
Closed 3 years ago.
I want to generate some random numbers, that is for such distribution:
10% of them are in Category A (T = 6),
40% of them are in Category B (T = 8),
40% of them are in Category C (T = 10),
10% of them are in Category D (T = 12).
I just started to learn MATLAB and I tried rand(x) and randn(x) but it seems neither of them can do that?

You'll have to set up some kind of mapping from the uniformly distributed random numbers you get from rand to your desired values, i.e. with respect to the wanted distribution.
In my solution, I generate random numbers using rand, and map them to integers 1, 2, 3, 4 as well as to (categorical) characters A, B, C, D. I built a whole function to support variable amount of input arguments to mimic the behaviour of rand.
That's the code of the myRand function:
function [rn, in, ch] = myRand(varargin)
% No input arguments.
if (numel(varargin) == 0)
rn = rand();
% One input argument; might be a scalar or an array.
elseif (numel(varargin) == 1)
a = varargin{1};
if (!isnumeric(a))
error('myRand: argument must be numeric');
end
rn = rand(a);
% More than one input argument; must be scalars.
elseif (numel(varargin) > 1)
if (!all(cellfun(#(x)isnumeric(x), varargin)))
error('myRand: arguments must be numeric');
end
if (!all(cellfun(#(x)isscalar(x), varargin)))
error('myRand: arguments must be scalar');
end
rn = rand(varargin{:});
end
in = zeros(size(rn));
in((0 <= rn) & (rn < 0.1)) = 1;
in((0.1 <= rn) & (rn < 0.5)) = 2;
in((0.5 <= rn) & (rn < 0.9)) = 3;
in((0.9 <= rn) & (rn < 1)) = 4;
ch = cell(size(rn));
ch((0 <= rn) & (rn < 0.1)) = { 'A' };
ch((0.1 <= rn) & (rn < 0.5)) = { 'B' };
ch((0.5 <= rn) & (rn < 0.9)) = { 'C' };
ch((0.9 <= rn) & (rn < 1)) = { 'D' };
end
And, here's some test code with the corresponding outputs:
% Single random number with integer and category
[rn, in, ch] = myRand()
% Multiple random numbers with integers and categories (array input)
[rn, in, ch] = myRand([2, 3])
% Multiple random numbers with integers and categories (multiple scalars input)
[rn, in, ch] = myRand(2, 3)
rn = 0.19904
in = 2
ch =
{
[1,1] = B
}
rn =
0.206294 0.420426 0.835194
0.793874 0.593371 0.034055
in =
2 2 3
3 3 1
ch =
{
[1,1] = B
[2,1] = C
[1,2] = B
[2,2] = C
[1,3] = C
[2,3] = A
}
rn =
0.96223 0.87840 0.49925
0.54890 0.88436 0.92096
in =
4 3 2
3 3 4
ch =
{
[1,1] = D
[2,1] = C
[1,2] = C
[2,2] = C
[1,3] = B
[2,3] = D
}
Hope that helps!
Disclaimer: I tested the code with Octave 5.1.0, but I'm quite sure, that it should be fully MATLAB-compatible. If not, please leave a comment, and I'll try to fix possible issues.

Related

Left and right sides have a different number of elements when trying to define conditions of a periodic signal

I'm trying to plot periodic signal that has an exponent and time conditions, but I'm getting an error in my line one_period(-5 <= t1 & t1 < 0) = exp(10*t1 - 10);.
I'm still very new to MATLAB, so I'm unsure of how to fix this error.
T = 1;
t1 = linspace(0, T, 100 + 1);
t1(end) = [];
one_period = zeros(size(t1));
one_period(-5 <= t1 & t1 < 0) = exp(10*t1 - 10);
one_period(0 <= t1 & t1 < 5) = 10;
signal = repmat(one_period, 1, 5);
signal_length = 10;
t_signal_length = linspace(0, T*signal_length, signal_length*100 + 1);
t_signal_length(end) = [];
figure;
plot (t_signal_length, signal);
You are getting an error on this line:
one_period(-5 <= t1 & t1 < 0) = exp(10*t1 - 10);
Because t1 is defined as
T = 1;
t1 = linspace(0, T, 100 + 1);
t1(end) = [];
So it is an array between 0 and 1 with 101 elements, then with the last element removed.
And the condition you've used on the erroneous line is not satisfied for any element of t1, which values are you expecting to be between -5 and 0 in an array defined as values between 0 and 1?
Therefore this -5 <= t1 & t1 < 0 is an array where every value is false, i.e you are assigning into zero indices of one_period, but you are trying to assign something (on the right of the =) which has as many values as t1. One of these does not fit into the other!
MATLAB has pretty good debugging tools, which allow you to add breakpoints, run snippets of code whilst in debug, and view variables as the code progresses. You need to have a clear idea of your expected behaviour and then run through from a breakpoint to identify which part is failing.
If you had a different condition (which actually had some true values) then maybe you just need to use this index on both sides of the assignment so that the number of values matches the number of indicies. This would look something like
bValid = 0.5 <= t1 & t1 < 0.8; % some condition for a subset within (0,1)
one_period(bValid) = exp(10*t1(bValid) - 10); % Note indexing t1(bValid) too
T = 1; %-signal period
t1 = linspace(0, T, 100 + 1); %-time series from 0 to T
t1(end) = []; %-remove last value
one_period = zeros(size(t1)); %-make function array (all zeros)
one_period(0 <= t1 & t1 < 0.5) = exp(10*t1(0 <= t1 & t1 < 0.5) - 10); %-define the signal
one_period(0.5 <= t1 & t1 < 1) = exp(10*t1(0.5 <= t1 & t1 < 1) - 10);
signal_length = 5;
signal = repmat(one_period, 1, signal_length); %-replicate the signal 5 times
t_signal_length = linspace(0, T*signal_length, signal_length*100 + 1); %-replicate the time series
t_signal_length(end) = [];
figure; %plot
plot (t_signal_length, signal);
Example plot: https://imgur.com/a/XhZnqId
As always, comments are helpful for knowing what your intentions were but here's what I think you were trying to achieve.

Hash function that returns the same hash for a sum even if different terms lead to the same sum

let's say I have:
n = 14
n is the result of the following sums of integers:
[5, 2, 7] -> 5 + 2 + 7 = 14 = n
[3, 4, 5, 2] -> 3 + 4 + 5 + 2 = 14 = n
[1, 13] -> 1 + 13 = 14 = n
[13, 1] -> 13 + 1 = 14 = n
[4, 3, 5, 2] -> 4 + 3 + 5 + 2 = 14 = n
...
I would need a hash function h so that:
h([5, 2, 7]) = h([3, 4, 5, 2]) = h([1, 13]) = h([13, 1]) = h([4, 3, 5, 2]) = h(...)
I.e. it doesn't matter the order of the integer terms and as long as their integer sum is the same, their hash should also the same.
I need to do this without computing the sum n, because the terms as well as n can be very high and easily overflow (they don't fit the bits of an int), that's why I am asking this question.
Are you aware or maybe do you have an insight on how I can implement such a hash function?
Given a list/sequence of integers, this hash function must return the same hash if the sum of the integers would be the same, but without computing the sum.
Thank you for your attention.
EDIT: I elaborated on #derpirscher's answer and modified his function a bit further as I had collisions on multiples of BIG_PRIME (this example is in JavaScript):
function hash(seq) {
const BIG_PRIME = 999999999989;
const MAX_SAFE_INTEGER_DIV_2_FLOOR = Math.floor(Number.MAX_SAFE_INTEGER / 2);
let h = 0;
for (i = 0; i < seq.length; i++) {
let value = seq[i];
if (h > MAX_SAFE_INTEGER_DIV_2_FLOOR) {
h = h % BIG_PRIME;
}
if (value > MAX_SAFE_INTEGER_DIV_2_FLOOR) {
value = value % BIG_PRIME;
}
h += value;
}
return h;
}
My question now would be: what do you think about this function? Are there some edge cases I didn't take into account?
Thank you.
EDIT 2:
Using the above function hash([1,2]); and hash([4504 * BIG_PRIME +1, 4504 * BIG_PRIME + 2]) will collide as mentioned by #derpirscher.
Here is another modified of version of the above function, which computes the modulo % BIG_PRIME only to one of the two terms if either of the two are greater than MAX_SAFE_INTEGER_DIV_2_FLOOR:
function hash(seq) {
const BIG_PRIME = 999999999989;
const MAX_SAFE_INTEGER_DIV_2_FLOOR = Math.floor(Number.MAX_SAFE_INTEGER / 2);
let h = 0;
for (let i = 0; i < seq.length; i++) {
let value = seq[i];
if (
h > MAX_SAFE_INTEGER_DIV_2_FLOOR &&
value > MAX_SAFE_INTEGER_DIV_2_FLOOR
) {
if (h > MAX_SAFE_INTEGER_DIV_2_FLOOR) {
h = h % BIG_PRIME;
} else if (value > MAX_SAFE_INTEGER_DIV_2_FLOOR) {
value = value % BIG_PRIME;
}
}
h += value;
}
return h;
}
I think this version lowers the number of collisions a bit further.
What do you think? Thank you.
EDIT 3:
Even though I tried to elaborate on #derpirscher's answer, his implementation of hash is the correct one and the one to use.
Use his version if you need such an hash function.
You could calculate the sum modulo some big prime. If you want to stay within the range of int, you need to know what the maximum integer is, in the language you are using. Then select a BIG_PRIME that's just below maxint / 2
Assuming an int to be 4 bytes, maxint = 2147483647 thus the biggest prime < maxint/2 would be 1073741789;
int hash(int[] seq) {
BIG_PRIME = 1073741789;
int h = 0;
for (int i = 0; i < seq.Length; i++) {
h = (h + seq[i] % BIG_PRIME) % BIG_PRIME;
}
return h;
}
As at every step both summands will always be below maxint/2 you won't get any overflows.
Edit
From a mathematical point of view, the following property which may be important for your use case holds:
(a + b + c + ...) % N == (a % N + b % N + c % N + ...) % N
But yeah, of course, as in every hash function you will have collisions. You can't have a hash function without collisions, because the size of the domain of the hash function (ie the number of possible input values) is generally much bigger than the the size of the codomain (ie the number of possible output values).
For your example the size of the domain is (in principle) infinite, as you can have any count of numbers from 1 to 2000000000 in your sequence. But your codomain is just ~2000000000 elements (ie the range of int)

Matlab recursion

Need a little help understanding what is happening in this function particularly line 7 [Fnm1,Fnm2] = fibrecurmemo(N-1); I don't understand how a new variable can be declared here with in the array. an example of what is happening would be appreciated.
function [Fn,Fnm1] = fibrecurmemo(N)
% Computes the Fibonacci number, F(N), using a memoized recursion
if N <= 2
Fn = 1;
Fnm1 = 1;
else
[Fnm1,Fnm2] = fibrecurmemo(N-1);
Fn = Fnm1 + Fnm2;
end
end
Say we start with:
fibrecurmemo(3) %// N is 3
The else statements run (since N > 2):
[Fnm1,Fnm2] = fibrecurmemo(2); %//statement 1
Fn = Fnm1 + Fnm2; %//statement 2
Before statement 2 can run, fibrecurmemo(2) must first run.
The if statements in fibrecurmemo(2) run (since N <= 2):
Fn = 1;
Fnm1 = 1;
As a result, fibrecurmemo(2) returns 1, 1.
Contininuing from statement 1 above,
[1,1] = fibrecurmemo(2); %//statement 1
Fn = 1 + 1; %//statement 2
Finally,
[2, 1] = fibrecurmemo(3);
The function returns two values.
function [xFive,yFive] = addFive(x,y)
xFive = x + 5;
yFive = y + 5;
end
xx = (addFive(3,4))
xx will be equal to 8 in this example
the syntax for assignment for multiple return values is
[a,b,c,...] = someFunc();
where someFunc() has output of [a,b,c,...]
[aa,bb] = addFive(3,4);
cc = addFive(3,4);
if you do it this way you would get
aa == 8
bb == 9
cc == 8
in the case of cc instead of [aa,bb] Then you will just get the first return value.
i.e. you could do
x = fibrecurmemo(5)
[y,z] = fibrecurmemo(5)
in this case x == y

Faster way to group data in the same quartile range

Consider a column of a 10 x 10 matrix K, say K(:,1)
I would like to create a 10x4 binary matrix which tells us which quarter range the row entry belongs to. For example
ith row of binary matirx : [ 1 0 0 0 ] => K(i,1)<prctile(K(:,1),25)
My code:
%%%
K = randi(10,10);
BINMAT = zeros(size(K,1),4);
y_1 = prctile(K(:,1),25) ;
ID_1 = find(K(:,1) < y_1);
BINMAT(ID_1,1)=1;
y_2 = prctile(K(:,1),50);
ID_2 = find(( K(:,1) > y_1 & K(:,1) < y_2 ));
BINMAT(ID_2,2)=1;
y_3 = prctile(K(:,1),75);
ID_3 = find(( K(:,1) > y_2 & K(:,1) < y_3 ));
BINMAT(ID_3,3)=1;
y_4 = prctile(K(:,1),100);
ID_4 = find((K(:,1) > y_3 & K(:,1) < y_4 ));
BINMAT(ID_4,4)=1;
%%%
If I have to do this not just for one column but for a set of columns, say A = [ 1 2 5 6], and BINMAT should have 16 columns (4 for each column of K) .Is there a faster way to do this?
You can use a for loop that iterates over the desired column indexes given by A:
K = randi(10,10);
A = [1 2 5 6]; % columns in K to process
BINMAT = zeros(size(K,1), 4*length(A));
cnt = 0; % helper
for col_indx = A
y_1 = prctile(K(:,col_indx),25) ;
ID_1 = find(K(:,col_indx) < y_1);
BINMAT(ID_1, 4*cnt + 1) = 1;
y_2 = prctile(K(:,col_indx),50);
ID_2 = find(( K(:,col_indx) > y_1 & K(:,col_indx) < y_2 ));
BINMAT(ID_2, 4*cnt + 2)=1;
y_3 = prctile(K(:,col_indx),75);
ID_3 = find(( K(:,col_indx) > y_2 & K(:,col_indx) < y_3 ));
BINMAT(ID_3, 4*cnt + 3)=1;
y_4 = prctile(K(:,col_indx),100);
ID_4 = find((K(:,col_indx) > y_3 & K(:,col_indx) < y_4 ));
BINMAT(ID_4, 4*cnt + 4)=1;
cnt = cnt + 1;
end
I have noticed that many of the rows of BINMAT contain only zeros because the code you posted does not take values equal to y_1, y_2, y_3 and y_4 into account. I think you should use K(:,col_indx) >= y_1 ... and so on.
Another suggestion:
K = randi(10,10)
p = 25:25:100;
Y = prctile(K, p);
Y = [zeros(1, size(Y, 2)) ;Y];
BINMAT = zeros(size(K, 1), length(p), size(K, 2));
for j = 1:size(K, 2)
for i = 1:length(p)
BINMAT(Y(i, j) <= K(:,j) & K(:, j) <= Y(i+1, j), i, j) = 1;
end
end
Then, BINMAT(:, :, i) is the binary matrix, as you defined it, for K(:, i).
Percentile is, at its heart, the position of an element in the sorted list. So using sort directly will provide the most efficient solution, since you want multiple percentiles out of multiple columns.
First we need a way to assign fixed bins to the sorted positions. Here's the vector that I think prctile uses, but since 10 doesn't split evenly into 4 bins, it's somewhat arbitrary. (in other words, do you assign element 3 to the 0-25% bin or the 25%-50% bin)? floor(4*(0.5+(0:9).')/10)+1
Now we just need to sort each column, and assign the sort position of each original element to one of those positions. The second output of sort does most of the work:
K = randi(10,10);
A = [1 2 5 6]; % columns in K to process
BINMAT = zeros(size(K,1), 4*length(A));
bins = floor(4*(0.5+(0:9).')/10)+1;
[sortedK, idx] = sort(K(:,A));
% The k'th element of idx belongs to the c(k) bin. So now generate the output.
% We need to offset to the correct block of BINMAT for each column
offset_bins = bsxfun(#plus, bins, 4*(0:length(A)-1));
BINMAT(sub2ind(size(BINMAT), idx, offset_bins)) = 1;

forcing the columns of a matrix within different limits

I have a matrix named l having size 20X3.
What I wanted to do was this :
Suppose I have this limits:
l1_max=20; l1_min=0.5;
l2_max=20; l2_min=0.5;
mu_max=20; mu_min=0.5;
I wanted to force all the elements of the matrix l within the limits.
The values of 1st column within l1_max & l1_min.
The values of 2nd column within l2_max & l2_min.
The values of 3rd column within mu_max & mu_min.
What I did was like this:
for k=1:20
if l(k,1)>l1_max
l(k,1) = l1_max;
elseif l(k,1)<l1_min
l(k,1) = l1_min;
end
if l(k,2)>l2_max
l(k,2) = l2_max;
elseif l(k,2)<l2_min
l(k,2) = l2_min;
end
if l(k,3)>mu_max
l(k,3) = mu_max;
elseif l(k,3)<mu_min
l(k,3) = mu_min;
end
end
Can it be done in a better way ?
You don't have to loop over rows, use vectorized operations on entire columns:
l(l(:, 1) > l1_max, 1) = l1_max;
l(l(:, 1) < l1_min, 1) = l1_min;
Similarily:
l(l(:, 2) > l2_max, 2) = l2_max;
l(l(:, 2) < l2_min, 2) = l2_min;
l(l(:, 3) > l2_max, 3) = mu_max;
l(l(:, 3) < l2_min, 3) = mu_min;
An alternative method, which resembles to Bas' idea, is to apply min and max as follows:
l(:, 1) = max(min(l(:, 1), l1_max), l1_min);
l(:, 2) = max(min(l(:, 2), l2_max), l2_min);
l(:, 3) = max(min(l(:, 3), mu_max), mu_min);
It appears that both approaches have comparable performance.
You don't even have to loop over all columns, the operation on the whole matrix can be done in 2 calls to bsxfun, independent of the number of columns:
column_max = [l1_max, l2_max, mu_max];
column_min = [l1_min, l2_min, mu_min];
M = bsxfun(#min, M, column_max); %clip to maximum
M = bsxfun(#max, M, column_min); %clip to minimum
This uses two tricks: to clip a value between min_val and max_val, you can do clipped_x = min(max(x, min_val), max_val). The other trick is to use the somewhat obscure bsxfun, which applies a function after doing singleton expansion. When you use it on two matrices, it 'extrudes' the smallest one to the same size as the largest one before applying the function, so the example above is equivalent to M = min(M, repmat(column_max, size(M, 1), 1)), but hopefully calculated in a more efficient way.
Below is a benchmark to test the various methods discussed so far. I'm using the TIMEIT function found on the File Exchange.
function [t,v] = testClampColumns()
% data and limits ranges for each column
r = 10000; c = 500;
M = randn(r,c);
mn = -1.1 * ones(1,c);
mx = +1.1 * ones(1,c);
% functions
f = { ...
#() clamp1(M,mn,mx) ;
#() clamp2(M,mn,mx) ;
#() clamp3(M,mn,mx) ;
#() clamp4(M,mn,mx) ;
#() clamp5(M,mn,mx) ;
};
% timeit and check results
t = cellfun(#timeit, f, 'UniformOutput',true);
v = cellfun(#feval, f, 'UniformOutput',false);
assert(isequal(v{:}))
end
Given the following implementations:
1) loop over all values and compare against min/max
function M = clamp1(M, mn, mx)
for j=1:size(M,2)
for i=1:size(M,1)
if M(i,j) > mx(j)
M(i,j) = mx(j);
elseif M(i,j) < mn(j)
M(i,j) = mn(j);
end
end
end
end
2) compare each column against min/max
function M = clamp2(M, mn, mx)
for j=1:size(M,2)
M(M(:,j) < mn(j), j) = mn(j);
M(M(:,j) > mx(j), j) = mx(j);
end
end
3) truncate each columns to limits
function M = clamp3(M, mn, mx)
for j=1:size(M,2)
M(:,j) = min(max(M(:,j), mn(j)), mx(j));
end
end
4) vectorized version of truncation in (3)
function M = clamp4(M, mn, mx)
M = bsxfun(#min, bsxfun(#max, M, mn), mx);
end
5) absolute value comparison: -a < x < a <==> |x| < a
(Note: this is not applicable to your case, since it requires a symmetric limits range. I only included this for completeness. Besides it turns out to be the slowest method.)
function M = clamp5(M, mn, mx)
assert(isequal(-mn,mx), 'Only works when -mn==mx')
idx = bsxfun(#gt, abs(M), mx);
v = bsxfun(#times, sign(M), mx);
M(idx) = v(idx);
end
The timing I get on my machine with an input matrix of size 10000x500:
>> t = testClampColumns
t =
0.2424
0.1267
0.0569
0.0409
0.2868
I would say that all the above methods are acceptably fast enough, with the bsxfun solution being the fastest :)