Using boxplot with different length of vectors - matlab

Hi I try to make a boxplot for hourly values of data for differnt months. So in one diagramm I have a boxplot for January Feb March and so on... As the amount of hours of each month vary boxplot always gives me an error.
code
X=[N11(:,9) D12(:,9) J1(:,9) F2(:,9) ];
G=[1 2 3 4];
boxplot(X,G)
size of data:
J1=744
F2=624
D12=744
N11=720
thanks matthias

You can manually append all of your data together in a single vector and then create a grouping variable g whose label indicates to which group a data point belongs on the corresponding row. For example:
A = randn(10, 1); B = randn(12, 1); C = randn(4, 1);
g = [repmat(1, [10, 1]) ; repmat(2, [12, 1]); repmat(3, [4, 1])];
figure; boxplot([A; B; C], g);

Similar questions have been asked before. See:
http://www.mathworks.com/matlabcentral/answers/60818-boxplot-with-vectors-of-different-lengths
Basically, you put all the data in a 1-D array and use another 1-D array (of the same length) to label the groups.

Related

How to neatly sum values of histogram bins, given a value matrix and an associated category association matrix (bin association)

Consider the example following below, where I have a 10x10 matrix, say A, of random values in some range, say [-5, 5]. I quantize the values of A into 8 categories, 1, ..., 8, such that an additional 10x10 matrix, say qA, describes the category association for each number in A. Finally, I produce the sums of all values assigned to each category. My question regards this final step.
myRange = 5; % values in open interval [-myRange, myRange]
A = myRange*(2*rand(10) - 1);
qA = uencode(A, 3, myRange)+1;
% (+) create "histogram" of sum of values assigned to each bin
myHistogram = zeros(8,1);
for i = 1:numel(A)
myHistogram(qA(i)) = myHistogram(qA(i)) + A(i);
end
bar(myHistogram)
Question: Is the some neater way of doing this, specifically the counting step (+) above? (Some better alternative than explicitly iterating over each element in the matrix A?).
Just as I was about to finish up and post my question I found a satisfying answer to it, however not here on SO. As self-answering is encouraged, I'll post the Q+A instead of aborting this Q posting.
Hence, based on the following Matlab Central thread, one neater solution is as follows:
myRange = 5; % values in open interval [-myRange, myRange]
A = myRange*(2*rand(10) - 1);
qA = uencode(A, 3, myRange)+1;
% or, if you dont have Signal Processing Toolbox required for 'uencode'
% [~, ~, qA] = histcounts(A, -myRange:myRange/4:myRange);
% (+) create "histogram" of sum of values assigned to each bin
myHistogram = accumarray(qA(:), A(:), [8 1])
Possibly there's alternative/even better ways to do this, performing quantizing and bin value summation in same step?

Nx2 matrix of points [x1 y1; x2 y2; etc.], get highest y value for each unique x

I'm trying to find an idiomatic way to do this.
Essentially I have an Nx2 matrix of points of the form
A = [3 4; 3 5; 4 5; 4, 6; 7 3]
I'd like my output to be [3 5; 4 6; 7 3]. In other words I would like each unique x value along with the maximum y value associated with that x.
I was hoping there would be some sort of
unique(A, 'rows', 'highestterm', 2)
method for accomplishing this, but couldn't find anything. Can anyone think of a vectorized way to solve this problem? I can do it pretty easily in a for loop, but would like to avoid that if possible.
I don't know of any single call, like you hoped. But, it can be done fairly tightly (and fully vectorized) like the code below.
%sort by first and then second column
A = sortrows(A,[1 2]);
%find each change in the first column of A
inds = find(diff(A(:,1)) > 0);
%add the last point...because find(diff) doesn't get the last point
inds(end+1) = size(A,1);
%get just those rows that meet the desired criteria
A = A(inds,:);
So, this works by sorting the data and looking for the values in the first column that don't repeat. If there are repeated values, this code grabs the last of the repeating values. Finally, because we sorted by both columns via sortrows(A,[1 2]), the last entry for a repeating value will have the biggest corresponding value from the 2nd column. I think that this hits all of your requirements.
Using accumarray and unique:
[r1, ~, u] = unique(A(:,1));
r2 = accumarray(u, A(:,2), [], #max);
result = [r1 r2];

Find blocks of same dates in a vector and average over the corresponding block of data in another vector

I'm new to Matlab and I'm looking for a solution to a problem of determining blocks of same dates in one vector and to average over the corresponding block of data in another vector.
Given is a vector consisting of several blocks of dates in the format 'dd-mmm-yyyy'. The blocks with same dates can have variable length. An example would be
T= ['03-Jan-2013';
'03-Jan-2013';
'03-Jan-2013';
'04-Jan-2013';
'04-Jan-2013';
'05-Jan-2013']
Each date in T corresponds to a data entry in another vector H (for simplicity same dates from T have here the same corresponding number in H)
H= [1;
1;
1;
5;
5;
6]
The goal is now to determine the average of the elements of H which correspond to the same dates and return a modified date and data vector Tout and Hout which would look like this:
Tout=['03-Jan-2013';
'04-Jan-2013';
'05-Jan-2013']
and
Hout=[1;
5;
6]
where Hout represents the averaged values.
Both vectors are initially drawn from a textfile and can have a length of about 100k.
So looping is probably not the best thing to do.
I appreciate any help!
Use unique to get the unique dates and their multiplicity and accumarray to average over the ones that are repeated
[Tout,~,n] = unique(T, 'rows');
Hout = accumarray(n, H, [], #mean);

Efficient aggregation of high dimensional arrays

I have a 3 dimensional (or higher) array that I want to aggregate by another vector. The specific application is to take daily observations of spatial data and average them to get monthly values. So, I have an array with dimensions <Lat, Lon, Day> and I want to create an array with dimensions <Lat, Lon, Month>.
Here is a mock example of what I want. Currently, I can get the correct output using a loop, but in practice, my data is very large, so I was hoping for a more efficient solution than the second loop:
% Make the mock data
A = [1 2 3; 4 5 6];
X = zeros(2, 3, 9);
for j = 1:9
X(:, :, j) = A;
A = A + 1;
end
% Aggregate the X values in groups of 3 -- This is the part I would like help on
T = [1 1 1 2 2 2 3 3 3];
X_agg = zeros(2, 3, 3);
for i = 1:3
X_agg(:,:,i) = mean(X(:,:,T==i),3);
end
In 2 dimensions, I would use accumarray, but that does not accept higher dimension inputs.
Before getting to your answer let's first rewrite your code in a more general way:
ag = 3; % or agg_size
X_agg = zeros(size(X)./[1 1 ag]);
for i = 1:ag
X_agg(:,:,i) = mean(X(:,:,(i-1)*ag+1:i*ag), 3);
end
To avoid using the for loop one idea is to reshape your X matrix to something that you can use the mean function directly on.
splited_X = reshape(X(:), [size(X_agg), ag]);
So now splited_X(:,:,:,i) is the i-th part
that contains all the matrices that should be aggregated which is X(:,:,(i-1)*ag+1:i*ag)) (like above)
Now you just need to find the mean in the 3rd dimension of splited_X:
temp = mean(splited_X, 3);
However this results in a 4D matrix (where its 3rd dimension size is 1). You can again turn it into 3D matrix using reshape function:
X_agg = reshape(temp, size(X_agg))
I have not tried it to see how much more efficient it is, but it should do better for large matrices since it doesn't use for loops.

Creating a vector with random sampling of two vectors in matlab

How does one create a vector that is composed of a random sampling of two other vectors?
For example
Vector 1 [1, 3, 4, 7], Vector 2 [2, 5, 6, 8]
Random Vector [random draw from vector 1 or 2 (value 1 or 2), random draw from vector 1 or 2 (value 3 or 5)... etc]
Finally, how can one ask matlab to repeat this process n times to draw a distribution of results?
Thank you,
There are many ways you could do this. One possibility is:
tmp=round(rand(size(vector1)))
res = tmp.*vector1 + (1-tmp).*vector2
To get one mixed sample, you may use the idea of the following code snippet (not the optimal one, but maybe clear enough):
a = [1, 3, 4, 7];
b = [2, 5, 6, 8];
selector = randn(size(a));
sample = a.*(selector>0) + b.*(selector<=0);
For n samples put the above code in a for loop:
for k=1:n
% Sample code (without initial "samplee" assignments)
% Here do stuff with the sample
end;
More generally, if X is a matrix and for each row you want to take a sample from a column chosen at random, you can do this with a loop:
y = zeros(size(X,1),1);
for ii = 1:size(X,1)
y(ii) = X(ii,ceil(rand*size(X,2)));
end
You can avoid the loop using clever indexing via sub2ind:
idx_n = ceil(rand(size(X,1),1)*size(X,2));
idx = sub2ind(size(X),(1:size(X,1))',idx_n);
y = X(idx);
If I understand your question, you are choosing two random numbers. First you decide whether to select vector 1 or vector 2; next you pick an element from the chosen vector.
The following code takes advantage of the fact that vector1 and vector2 are the same length:
N = 1000;
sampleMatrix = [vector1 vector2];
M = numel(sampleMatrix);
randIndex = ceil(rand(1,N)*M); % N random numbers from 1 to M
randomNumbers = sampleMatrix(randIndex); % sample N times from the matrix
You can then display the result with, for instance
figure; hist(randomNumbers); % draw a histogram of numbers drawn
When vector1 and vector2 have different elements, you run into a problem. If you concatenate them, you will end up picking elements from the longer vector more often. One way around this is to create random samplings from both arrays, then choose between them:
M1 = numel(vector1);
M2 = numel(vector2);
r1 = ceil(rand(1,N)*M1);
r2 = ceil(rand(1,N)*M2);
randMat = [vector1(r1(:)) vector2(r2(:))]; % two columns, now pick one or the other
randPick = ceil(rand(1,N)*2);
randomNumbers = [randMat(randPick==1, 1); randMat(randPick==2, 2)];
On re-reading, maybe you just want to pick "element 1 from either 1 or 2", then "element 2 from either 1 or 2", etc for all the elements of the vector. In that case, do
N=numel(vector1);
randPick = ceil(rand(1,N)*2);
randMat=[vector1(:) vector2(:)];
randomNumbers = [randMat(randPick==1, 1); randMat(randPick==2, 2)];
This problem can be solved using the function datasample.
Combine both vectors into one and apply the function. I like this approach more than the handcrafted versions in the other answers. It gives you much more flexibility in choosing what you actually want, while being a one-liner.