x=rand(1,10); bins=discretize(x,0:0.25:1);
An instance of running the above line in Matlab R2020b produces the following outputs for x and bins.
x = 0.1576, 0.9706, 0.9572, 0.4854, 0.8003, 0.1419, 0.4218, 0.9157, 0.7922, 0.9595
bins = 1, 4, 4, 2, 4, 1, 2, 4, 4, 4
The in-built function discretize is not yet implemented in Octave. How can I achieve the same values of bins in OCTAVE? Can anyone enlighten me? I am using Octave 6.2.0.
You can use interp1 with 'previous' option:
edges = 0:0.25:1;
x = [0.1576, 0.9706, 0.9572, 0.4854, 0.8003, 0.1419, 0.4218, 0.9157, 0.7922, 0.9595];
bins = interp1 (edges, 1:numel(edges), x, 'previous')
Another function that can be used is lookup:
bins = lookup(edges, x);
Here I compared the performance of interp1, lookup and also histc as recommended by Tasos Papastylianou:
edges = 0:0.025:1;
x = sort (rand(1,1000000));
disp ("-----INTERP1-------")
tic;bins = interp1 (edges, 1:numel(edges), x, 'previous');toc
disp ("-----HISTC-------")
tic;[ ~, bins ] = histc (x, edges);toc
disp ("-----LOOKUP-------")
tic; bins = lookup (edges, x);toc
The result:
-----INTERP1-------
Elapsed time is 0.0593688 seconds.
-----HISTC-------
Elapsed time is 0.0224149 seconds.
-----LOOKUP-------
Elapsed time is 0.0114679 seconds.
tl;dr:
x = [ 0.1576, 0.9706, 0.9572, 0.4854, 0.8003, 0.1419, 0.4218, 0.9157, 0.7922, 0.9595 ]
[ ~, Bins ] = histc( x, 0: 0.25: 1 )
% Bins = 1 4 4 2 4 1 2 4 4 4
Explanation:
According to the matlab manual:
Earlier versions of MATLAB® use the hist and histc functions as the primary way to create histograms and calculate histogram bin counts [...] The use of hist and histc in new code is discouraged [...] histogram, histcounts, and discretize are the recommended histogram creation and computation functions for new code.
and
The behavior of discretize is similar to that of the histcounts function. Use histcounts to find the number of elements in each bin. On the other hand, use discretize to find which bin each element belongs to (without counting).
Octave has not yet implemented discretize, but still supports histc, which as the above implies, does the same thing but with a different interface.
According to the octave documentation of histc
-- [N, IDX] = histc ( X, EDGES )
Compute histogram counts.
[...]
When a second output argument is requested an index matrix is also
returned. The IDX matrix has the same size as X. Each element of
IDX contains the index of the histogram bin in which the
corresponding element of X was counted.
Therefore the answer to your problem is
[ ~, Bins ] = histc( x, 0:0.25:1 )
Using your example:
x = [ 0.1576, 0.9706, 0.9572, 0.4854, 0.8003, 0.1419, 0.4218, 0.9157, 0.7922, 0.9595 ]
[ ~, Bins ] = histc( x, 0: 0.25: 1 )
% Bins = 1 4 4 2 4 1 2 4 4 4
PS. If you like the interface provided by discretize, you can easily create this function by yourself, by wrapping histc appropriately:
discretize = #(X, EDGES) nthargout( 2, #histc, X, EDGES )
You can now use this discretize function directly as in your example.
Related
Suppose I have a column vector of formulae like this
N =
4*k2 + 5*k3 + k1*x
7*k2 + 8*k3 + k1*y
and a column vector of symbolic variables like this
k =
k1
k2
k3
The formulae are linear with respect to k. I'd like to find a matrix M such that M*k equals N.
I can do this with N/k. However, that gives
[ (4*k2 + 5*k3 + k1*x)/k1, 0, 0]
[ (7*k2 + 8*k3 + k1*y)/k1, 0, 0]
which is correct, but not what I want. What I want is the matrix
x 4 5
y 7 8
which seems to me the simplest answer in that it involves no variables from k.
How do I convince Matlab to factor out the specified variables from a formula or a vector of formulae?
You can use coeffs, specifically the form
C = coeffs(p,vars) returns coefficients of the multivariate polynomial p with respect to the variables vars.
Since the first input needs to be a polynomial, you need to pass each component of N:
coeffs(N(1), k)
coeffs(N(2), k)
Or use a loop and store all results in a symbolic array:
result = sym('result', [numel(N) numel(k)]); % create symbolic array
for m = 1:numel(N)
result(m,:) = coeffs(N(m), k);
end
In your example, this gives
result =
[ 5, 4, x]
[ 8, 7, y]
Based on #LuisMendo's answer, I used coeffs. But there are a couple of problems with coeffs. The first is that its result doesn't include any coefficients that are 0. The second is that it doesn't seem to guarantee that the coefficients are ordered the same way as the variables in its second argument. I came up with the following function to replace coeffs.
Luckily coeffs returns a second result that lists the variables associated with each item in the first result. (It's more complicated if the formula is not linear.)
function m = factorFormula(f, v )
% Pre: f is a 1x1 sym representing a
% linear function of the variables in v.
% Pre: v is a column vector of variables
% Post: m is a row vector such that m*v equals f
% and the formulas in m do not contain the
% variables in v
[cx,tx] = coeffs(f,v)
n = size(v,1)
m = sym(zeros(1,n))
for i = 1:n
j = find(tx==v(i))
if size(j,2) == 1
m(i) = cx(j)
end
end
end
This only works for one formula, but it can be extended to a vector using the loop in #LuisMendo's answer or this equivalent expression in #Sanchises comment there.
cell2sym(arrayfun( #(f)factorFormula(f,k),N,'UniformOutput',false ) )
I hope there is a better answer than this.
I have a vector of simple numbers such as:
a=[1 2 3 4 5 6 7 8]
I would like to have all the numbers of the vector that fall in between [25% 75%] quartiles. However, when I use the command below:
quantile(a,[0.25 0.75])
It only gives me 2 numbers of 2 and 6 (instead of 3,4,5,6).
Do you have any solution how I can do it?
Based on the mathematical definition of a quantile, the quantile() function should not be returning {3,4,5,6} given [0.25 0.75].
A quantile of a may be thought of as the inverse of the cumulative distribution function (CDF) for a. Since the CDF Fa(x) = P(a ≤ x) is a right-continuous increasing function, its inverse Fa-1(q) will be a one-to-one function as well.
Thus quantile(0.25) can only return a single value (scalar), the smallest value x such that P(a ≤ x) = 0.25.
However, logical indexing will do the trick. See code below.
% MATLAB R2017a
a = [1 2 3 4 5 6 7 8];
Q = quantile(a,[0.25 0.75]) % returns 25th & 75th quantiles of a
aQ = a(a>=Q(1) & a<=Q(2)) % returns elements of a between 25th & 75th quantiles (inclusive)
How would I calculate the z-score of an entire 3 dimensional matrix in Matlab?
The Matlab command zscore standardises across vectors in just one of the dimensions of multidimensional arrays.
zscore documentation: https://uk.mathworks.com/help/stats/zscore.html
Here I show two equivalent methods:
You can edit zscore to view how the function works, or the documentation linked in your question gives the equation for zscore:
We can calculate this manually using the mean and std (standard deviation).
M = rand( 3, 5 ) * 10
>> M =
9.5929 1.4929 2.5428 9.2926 2.5108
5.4722 2.5751 8.1428 3.4998 6.1604
1.3862 8.4072 2.4352 1.966 4.7329
Z = ( M - mean(M(:)) ) / std(M(:)) % using M(:) to operate on the array as a vector
>> Z =
1.6598 -1.0771 -0.72235 1.5583 -0.73316
0.26743 -0.71145 1.1698 -0.39899 0.5
-1.1131 1.2591 -0.7587 -0.91727 0.017644
The advantage of this method is that you don't have to use the statistics toolbox required by zscore. This minor disadvantage is you lose the input checking of zscore, and protections if the standard deviation is 0.
If you want to use zscore then you can use reshape, after calculating the zscore as if it were a vector:
Z = reshape( zscore(M(:)), size(M) )
>> Z =
1.6598 -1.0771 -0.72235 1.5583 -0.73316
0.26743 -0.71145 1.1698 -0.39899 0.5
-1.1131 1.2591 -0.7587 -0.91727 0.017644
Note that both of these methods should behave the same as the standard zscore(M) for a vector input M.
This question generalizes the previous one Any way for matlab to sum an array according to specified bins NOT by for iteration? Best if there is buildin function for this. I am not sure, but I tried and the answers in previous post seem not to work with matrices.
For example, if
A = [7,8,1,1,2,2,2]; % the bins or subscripts
B = [2,1; ...
1,1; ...
1,1; ...
2,0; ...
3,1; ...
0,2; ...
2,4]; % the matrix
then the desired function "binsum" has two outputs, one is the bins, and the other is the accumulated row vectors. It is adding rows in B according to subscripts in A. For example, for 2, the sum is [3,1] + [0,2] + [2,4] = [5,6], for 1 it is [1,1] + [2,0] = [3,1].
[bins, sums] = binsum(A,B);
bins = [1,2,7,8]
sums = [2,1;
1,1;
3,1;
5,6]
The first method accumarray says its "val" argument can only be a scalar or vector. The second method spare seems not to accept a vector as the value "v" for each tuple (i,j) neither. So I have to post for help again, and it is still not desired to use iterations to go over the columns of B to do this.
I am using 2017a. Many thanks again!
A way to do that is using matrix multiplication:
bins = unique(A);
sums = (A==bins.')*B;
The above is memory-expensive, as it builds an intermediate logical matrix of size M×N, where M is the the number of bins and N is the length of A. Alternatively, you can build that matrix as sparse logical to save memory:
[bins, ~, labels] = unique(A);
sums = sparse(labels, 1:numel(A), true)*B;
A method base on sort and cumsum:
[s,I]=sort(A);
c=cumsum(B(I,:));
k= [s(1:end-1)~=s(2:end) true];
sums = diff([zeros(1,size(B,2)); c(k,:)])
bins=s(k)
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Generate random number with given probability matlab
I need to create a column vector with random assignments of the number 1, 2 and 3. However i need to be able to control the percentage occurrence of each oif these 3 numbers.
For example, i have a 100 x 1 column vector and I want 30 of the number 1, 50 of the number 2 and 20 of the number 3, in a random assignments.
I am not sure whether you can do that with rand or randi function.
May be you can write a small module something like this :
bit1 = 1 * ones(1,20);
bit2 = 2 * ones(1,50);
bit3 = 3 * ones(1,30);
bits = [bit1 bit2 bit3];
randbits = bits(:, randperm(length(bits)))
You can do it using the CDF (cumulative destribution function) of the percentage of each number.
pdf = [ 30 50 20 ]/100; % the prob. distribution fun. of the samples
cdf = cumsum( pdf );
% I assume here all entries of the PDF are positive and sum(pdf)==1
% If this is not the case, you may normalize pdf to sum to 1.
The sampling itself
n = 100; % number of samples required
v = rand(n,1); % uniformly samples
tmp = bsxfun( #le, v, cdf );
[~, r] = max( tmp, [], 2 );
As observed by #Dan (see comment below), last line can be replaced with
r = numel(pdf) + 1 - sum( tmp, 2 );
The vector r is a random vector of integers 1,2,3 and should satisfy the desired pdf