Taking hourly Averages Matlab with Vector - matlab

I have a vector in the form of
0 21.3400
0 22.3000
1 22.3000
The left column is the hour and the right side is the value. I need to calculate the averages for each hour. The problem is that my samples run for longer than 24 hours (multiple days), so it would loop back from 0-23 to 0-23 again. Another problem is that sometimes I am missing samples for a certain hour. For example;
12.0000 29.5000
14.0000 35.7400
Any ideas on how I can solve this problem?

The part "so it would loop back from 0-23 to 0-23 again" is unclear to me. Maybe you are looking for the modulo function mod(). But after you solved this particular problem, your averaging-problem can be taken care off using accumarray. It is like the perfect use case for this function.
%// your data
data = [
0 21.3400
0 22.3000
1 22.3000
12.0000 29.5000
14.0000 35.7400];
%// group (find subs)
[hours, b, subs] = unique(data(:,1));
%// apply function mean to grouped data
avg = accumarray(subs, data(:,2), [], #mean);
result = [hours, avg]
In result the result is stored table-like, the first column are the unique hours and the second column are the averaged datav alues for those hours.
result =
0 21.8200
1.0000 22.3000
12.0000 29.5000
14.0000 35.7400
As an example: for the hour 0 the average of the data values 21.3400 and 22.300 is correctly computed as (21.3400 + 22.300)/2, which equals 21.8200.

Related

How can I vectorize code that runs a function on subsets of a larger matrix?

Let's assume I have the following 9 x 5 matrix:
myArray = [
54.7 8.1 81.7 55.0 22.5
29.6 92.9 79.4 62.2 17.0
74.4 77.5 64.4 58.7 22.7
18.8 48.6 37.8 20.7 43.5
68.6 43.5 81.1 30.1 31.1
18.3 44.6 53.2 47.0 92.3
36.8 30.6 35.0 23.0 43.0
62.5 50.8 93.9 84.4 18.4
78.0 51.0 87.5 19.4 90.4
];
I have 11 "subsets" of this matrix and I need to run a function (let's say max) on each of these subsets. The subsets can be identified with the following matirx of logicals (identified column-wise, not row-wise):
myLogicals = logical([
0 1 0 1 1
1 1 0 1 1
1 1 0 0 0
0 1 0 1 1
1 0 1 1 1
1 1 1 1 0
0 1 1 0 1
1 1 0 0 1
1 1 0 0 1
]);
or via linear indexing:
starts = [2 5 8 10 15 23 28 31 37 40 43]; #%index start of each subset
ends = [3 6 9 13 18 25 29 33 38 41 45]; #%index end of each subset
such that the first subset is 2:3, the second is 5:6, and so on.
I can find the max of each subset and store it in a vector as follows:
finalAnswers = NaN(11,1);
for n=1:length(starts) #%i.e. 1 through the number of subsets
finalAnswers(n) = max(myArray(starts(n):ends(n)));
end
After the loop runs, finalAnswers contains the maximum value of each of the data subsets:
74.4 68.6 78.0 92.9 51.0 81.1 62.2 47.0 22.5 43.5 90.4
Is it possible to obtain the same result without the use of a for loop? In other words, can this code be vectorized? Would such an approach be more efficient than the current one?
EDIT:
I did some testing of the proposed solutions. The data I used was a 1,510 x 2,185 matrix with 10,103 subsets that varied in length from 2 to 916 with a standard deviation of subset length of 101.92.
I wrapped each solution in tic;for k=1:1000 [code here] end; toc; and here are the results:
for loop approach --- Elapsed time is 16.237400 seconds.
Shai's approach --- Elapsed time is 153.707076 seconds.
Dan's approach --- Elapsed time is 44.774121 seconds.
Divakar's approach #2 --- Elapsed time is 127.621515 seconds.
Notes:
I also tried benchmarking Dan's approach by wrapping the k=1:1000 for loop around just the accumarray line (since the rest could be
theoretically run just once). In this case the time was 28.29
seconds.
Benchmarking Shai's approach, while leaving the lb = ... line out
of the k loop, the time was 113.48 seconds.
When I ran Divakar's code, I got Non-singleton dimensions of the two
input arrays must match each other. errors for the bsxfun lines.
I "fixed" this by using conjugate transposition (the apostrophe
operator ') on trade_starts(1:starts_extent) and
intv(1:starts_extent) in the lines of code calling bsxfun. I'm
not sure why this error was occuring...
I'm not sure if my benchmarking setup is correct, but it appears that the for loop actually runs the fastest in this case.
One approach is to use accumarray. Unfortunately in order to do that we first need to "label" your logical matrix. Here is a convoluted way of doing that if you don't have the image processing toolbox:
sz=size(myLogicals);
s_ind(sz(1),sz(2))=0;
%// OR: s_ind = zeros(size(myLogicals))
s_ind(starts) = 1;
labelled = cumsum(s_ind(:)).*myLogicals(:);
So that just does what Shai's bwlabeln implementation does (but this will be 1-by-numel(myLogicals) in shape as opposed to size(myLogicals) in shape)
Now you can use accumarray:
accumarray(labelled(myLogicals), myArray(myLogicals), [], #max)
or else it may be faster to try
result = accumarray(labelled+1, myArray(:), [], #max);
result = result(2:end)
This is fully vectorized, but is it worth it? You'll have to do speed tests against your loop solution to know.
Use bwlabeln with a vertical connectivity:
lb = bwlabeln( myLogicals, [0 1 0; 0 1 0; 0 1 0] );
Now you have a label 1..11 for each region.
To get max value you can use regionprops
props = regionprops( lb, myArray, 'MaxIntensity' );
finalAnswers = [props.MaxIntensity];
You can use regionprops to get some other properties of each subset, but it is not too general.
If you wish to apply a more general function to each region, e.g., median, you can use accumarray:
finalAnswer = accumarray( lb( myLogicals ), myArray( myLogicals ), [], #median );
Ideas behind vectorization and optimization
One of the approaches that one can employ to vectorize this problem would be to convert the subsets into regular shaped blocks and then finding the max of the elements
of the those blocks in one go. Now, converting to regular shaped blocks has one issue here and it is that the subsets are unequal in lengths. To avoid this issue, one can
create a 2D matrix of indices starting from each of starts elements and extending until the maximum of the subset lengths. Good thing about this is, it allows
vectorization, but at the cost of more memory requirements which would depend on the scattered-ness of the subsets lengths.
Another issue with this vectorization technique would be that it could potentially lead to out-of-limits indices creations for final subsets.
To avoid this, one can think of two possible ways -
Use a bigger input array by extending the input array such that maximum of the subset lengths plus the starts indices still lie within the confinements of the
extended array.
Use the original input array for starts until we are within the limits of original input array and then for the rest of the subsets use the original loop code. We can call it the mixed programming just for the sake of having a short title. This would save us memory requirements on creating the extended array as discussed in the other approach earlier.
These two ways/approaches are listed next.
Approach #1: Vectorized technique
[m,n] = size(myArray); %// store no. of rows and columns in input array
intv = ends-starts; %// intervals
max_intv = max(intv); %// max interval
max_intv_arr = [0:max_intv]'; %//'# array of max indices extent
[row1,col1] = ind2sub([m n],starts); %// get starts row and column indices
m_ext = max(row1+max_intv); %// no. of rows in extended input array
myArrayExt(m_ext,n)=0; %// extended form of input array
myArrayExt(1:m,:) = myArray;
%// New linear indices for extended form of input array
idx = bsxfun(#plus,max_intv_arr,(col1-1)*m_ext+row1);
%// Index into extended array; select only valid ones by setting rest to nans
selected_ele = myArrayExt(idx);
selected_ele(bsxfun(#gt,max_intv_arr,intv))= nan;
%// Get the max of the valid ones for the desired output
out = nanmax(selected_ele); %// desired output
Approach #2: Mixed programming
%// PART - I: Vectorized technique for subsets that when normalized
%// with max extents still lie within limits of input array
intv = ends-starts; %// intervals
max_intv = max(intv); %// max interval
%// Find the last subset that when extended by max interval would still
%// lie within the limits of input array
starts_extent = find(starts+max_intv<=numel(myArray),1,'last');
max_intv_arr = [0:max_intv]'; %//'# Array of max indices extent
%// Index into extended array; select only valid ones by setting rest to nans
selected_ele = myArray(bsxfun(#plus,max_intv_arr,starts(1:starts_extent)));
selected_ele(bsxfun(#gt,max_intv_arr,intv(1:starts_extent))) = nan;
out(numel(starts)) = 0; %// storage for output
out(1:starts_extent) = nanmax(selected_ele); %// output values for part-I
%// PART - II: Process rest of input array elements
for n = starts_extent+1:numel(starts)
out(n) = max(myArray(starts(n):ends(n)));
end
Benchmarking
In this section we will compare the the two approaches and the original loop code against each other for performance. Let's setup codes before starting the actual benchmarking -
N = 10000; %// No. of subsets
M1 = 1510; %// No. of rows in input array
M2 = 2185; %// No. of cols in input array
myArray = rand(M1,M2); %// Input array
num_runs = 50; %// no. of runs for each method
%// Form the starts and ends by getting a sorted random integers array from
%// 1 to one minus no. of elements in input array. That minus one is
%// compensated later on into ends because we don't want any subset with
%// starts and ends as the same index
y1 = reshape(sort(randi(numel(myArray)-1,1,2*N)),2,[]);
starts = y1(1,:);
ends = y1(1,:)+1;
%// Remove identical starts elements
invalid = [false any(diff(starts,[],2)==0,1)];
starts = starts(~invalid);
ends = ends(~invalid);
%// Create myLogicals
myLogicals = false(size(myArray));
for k1=1:numel(starts)
myLogicals(starts(k1):ends(k1))=1;
end
clear invalid y1 k1 M1 M2 N %// clear unnecessary variables
%// Warm up tic/toc.
for k = 1:100
tic(); elapsed = toc();
end
Now, the placebo codes that gets us the runtimes -
disp('---------------------- With Original loop code')
tic
for iter = 1:num_runs
%// ...... approach #1 codes
end
toc
%// clear out variables used in the above approach
%// repeat this for approach #1,2
Benchmark Results
In your comments, you mentioned using 1510 x 2185 matrix, so let's do two case runs with such size and subsets of size 10000 and 2000.
Case 1 [Input - 1510 x 2185 matrix, Subsets - 10000]
---------------------- With Original loop code
Elapsed time is 15.625212 seconds.
---------------------- With Approach #1
Elapsed time is 12.102567 seconds.
---------------------- With Approach #2
Elapsed time is 0.983978 seconds.
Case 2 [Input - 1510 x 2185 matrix, Subsets - 2000]
---------------------- With Original loop code
Elapsed time is 3.045402 seconds.
---------------------- With Approach #1
Elapsed time is 11.349107 seconds.
---------------------- With Approach #2
Elapsed time is 0.214744 seconds.
Case 3 [Bigger Input - 3000 x 3000 matrix, Subsets - 20000]
---------------------- With Original loop code
Elapsed time is 12.388061 seconds.
---------------------- With Approach #1
Elapsed time is 12.545292 seconds.
---------------------- With Approach #2
Elapsed time is 0.782096 seconds.
Note that the number of runs num_runs was varied to keep the runtime of the fastest approach close to 1 sec.
Conclusions
So, I guess the mixed programming (approach #2) is the way to go! As future work, one can use standard deviation into the scattered-ness criteria if the performance suffers because of the scattered-ness and offload the work for most scattered subsets (in terms of their lengths) into the loop code.
Efficiency
Measure both the vectorised & for-loop code samples on your respective platform ( be it a <localhost> or Cloud-based ) to see the difference:
MATLAB:7> tic();max( myArray( startIndex(:):endIndex(:) ) );toc() %% Details
Elapsed time is 0.0312 seconds. %% below.
%% Code is not
%% the merit,
%% method is:
and
tic(); %% for/loop
for n = 1:length( startIndex ) %% may be
max( myArray( startIndex(n):endIndex(n) ) ); %% significantly
end %% faster than
toc(); %% vectorised
Elapsed time is 0.125 seconds. %% setup(s)
%% overhead(s)
%% As commented below,
%% subsequent re-runs yield unrealistic results due to caching artifacts
Elapsed time is 0 seconds.
Elapsed time is 0 seconds.
Elapsed time is 0 seconds.
%% which are not so straight visible if encapsulated in an artificial in-vitro
%% via an outer re-run repetitions ( for k=1:1000 ) et al ( ref. in text below )
For a better interpretation of the test results, rather test on much larger sizes than just on a few tens of row/cols.
EDIT:
An erroneous code removed, thanks Dan for the notice. Having taken more attention to emphasize the quantitative validation, that may prove the assumption that a vectorised code may, but need not in all circumstances, be faster is not an excuse for a faulty code, sure.
Output - quantitatively comparative data:
While recommended, there is not IMHO fair to assume, the memalloc and similar overheads to be excluded from the in-vivo testing. Test re-runs typically show VM-page hits improvements, other caching artifacts, while the raw 1st "virgin" run is what typically appears in the real code deployment ( excl. external iterators, for sure ). So consider the results with care and retest in your real environment ( sometimes being run as a Virtual Machine inside a bigger system -- that also makes VM-swap mechanics necessary to take into account once huge matrices start hurt on real-life memory-access patterns ).
On other Projects I am used to use [usec] granularity of the realtime test timing, but the more care is necessary to be taken into account about the test-execution conditions and O/S background.
So nothing but testing gives relevant answers to your specific code/deployment situation, however be methodic to compare data comparable in principle.
Alarik's code:
MATLAB:8> tic(); for k=1:1000 % ( flattens memalloc issues & al )
> for n = 1:length( startIndex )
> max( myArray( startIndex(n):endIndex() ) );
> end;
> end; toc()
Elapsed time is 0.2344 seconds.
%% time is 0.0002 seconds per k-for-loop <--[ ref.^ remarks on testing ]
Dan's code:
MATLAB:9> tic(); for k=1:1000
> s_ind( size( myLogicals ) ) = 0;
> s_ind( startIndex ) = 1;
> labelled = cumsum( s_ind(:) ).*myLogicals(:);
> result = accumarray( labelled + 1, myArray(:), [], #max );
> end; toc()
error: product: nonconformant arguments (op1 is 43x1, op2 is 45x1)
%%
%% [Work in progress] to find my mistake -- sorry for not being able to reproduce
%% Dan's code and to make it work
%%
%% Both myArray and myLogicals shape was correct ( 9 x 5 )

Repeat elements of vector [duplicate]

This question already has answers here:
Repeat copies of array elements: Run-length decoding in MATLAB
(5 answers)
Closed 8 years ago.
I have a value vector A containing elements i, for example:
A = [0.1 0.2 0.3 0.4 0.5];
and say r = [5 2 3 2 1];
Now I want to create a new vector Anew containing r(i) repetitions of the values i in A, such that the first r(1)=5 items in Anew have value A(1) and the length of the new vector is sum(r). Thus:
Anew = [0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.3 0.3 0.3 0.4 0.4 0.5]
I am sure this can be done with an elaborate for-loop combining e.g. repmat, but any chance someone knows how to do this in a smoother way?
As far as I'm aware, there is no equivalent function to do that in MATLAB, though R has rep that can do that for you.... so jealous.
In any case, the only way I can suggest is to run a for loop with repmat as you suggested. However, you can perhaps do arrayfun instead if you want to do this as a one-liner... well it's technically two to do the post-processing required to get this into a single vector. As such, you can try this:
Anew = arrayfun(#(x) repmat(A(x), r(x), 1), 1:numel(A), 'uni', 0);
Anew = vertcat(Anew{:});
This essentially does the for loop and concatenation of the replicated vectors with less code. We go through each pair of values in A and r and spit out replicated vectors. Each of them will be in a cell array, which is why vertcat is required to put it all into one vector.
We get:
Anew =
0.1000
0.1000
0.1000
0.1000
0.1000
0.2000
0.2000
0.3000
0.3000
0.3000
0.4000
0.4000
0.5000
Take note that other people have tried something similar to what you're doing in this post: A similar function to R's rep in Matlab. This is essentially mimicking R's way of doing rep, which is what you want to do!
Alternative - Using for loops
Because of #Divakar's benchmarking, I'm curious to see how pre-allocating the array, then using an actual for loop to iterate through A and r and populate it by indexing would benchmark. As such, the equivalent code to the above using for loops and indexing would be:
Anew = zeros(sum(r), 1);
counter = 1;
for idx = 1 : numel(r)
Anew(counter : counter + r(idx) - 1) = A(idx);
counter = counter + r(idx);
end
We would need a variable that keeps track of where we need to insert elements in the array, which is stored in counter. We offset this by the total number of elements to replicate per number, which is stored in each value of r.
As such, this method completely avoids using repmat and just uses indexing to generate our replicated vectors instead.
Benchmarking (à la Divakar)
Building on top of Divakar's benchmarking code, I actually tried running all of the tests on my machine, in addition to the for loop approach. I simply used his benchmarking code with the same test cases.
These are the timing results I get per algorithm:
Case #1 - N = 4000, max_repeat = 4000
------------------- With arrayfun
Elapsed time is 1.202805 seconds.
------------------- With cumsum
Elapsed time is 1.691591 seconds.
------------------- With bsxfun
Elapsed time is 0.835201 seconds.
------------------- With for loop
Elapsed time is 0.136628 seconds.
Case #2 - N = 10000, max_repeat = 1000
------------------- With arrayfun
Elapsed time is 2.117631 seconds.
------------------- With cumsum
Elapsed time is 1.080247 seconds.
------------------- With bsxfun
Elapsed time is 0.540892 seconds.
------------------- With for loop
Elapsed time is 0.127728 seconds.
In these cases, cumsum actually beats out arrayfun... which is what I originally expected. bsxfun beats everyone else out, except for the for loop. My guess is with the differing times in arrayfun between myself and Divakar, we are running our code on different architectures. I'm currently running my tests using MATLAB R2013a on a Mac OS X 10.9.5 MacBook Pro machine.
As we can see, the for loop is much quicker. I know for a fact that when it comes to indexing operations in a for loop, the JIT kicks in and gives you better performance.
First think of forming an index vector [1 1 1 1 1 2 2 3 3 3 4 4 5]. Noticing the regular increments here makes me think of cumsum: we can get these steps by putting ones at the correct location in a zeros vector: [1 0 0 0 0 1 0 1 0 0 1 0 1]. And that we can get by running another cumsum on the input list. After adjusting for end conditions and 1-based indexing, we get this:
B(cumsum(r) + 1) = 1;
idx = cumsum(B) + 1;
idx(end) = [];
A(idx)
bsxfun based approach -
A = [0.1 0.2 0.3 0.4 0.5]
r = [5 2 3 2 1]
repeats = bsxfun(#le,[1:max(r)]',r) %//' logical 2D array with ones in each column
%// same as the repeats for each entry
A1 = A(ones(1,max(r)),:) %// 2D matrix of all entries repeated maximum r times
%// and this resembles your repmat
out = A1(repeats) %// desired output with repeated entries
It could essentially become a two-liner -
A1 = A(ones(1,max(r)),:);
out = A1(bsxfun(#le,[1:max(r)]',r));
Output -
out =
0.1000
0.1000
0.1000
0.1000
0.1000
0.2000
0.2000
0.3000
0.3000
0.3000
0.4000
0.4000
0.5000
Benchmarking
Some benchmark results could be produced for the solutions presented here thus far.
Benchmarking Code - Case I
%// Parameters and input data
N = 4000;
max_repeat = 4000;
A = rand(1,N);
r = randi(max_repeat,1,N);
num_runs = 10; %// no. of times each solution is repeated for better benchmarking
disp('------------------- With arrayfun')
tic
for k1 = 1:num_runs
Anew = arrayfun(#(x) repmat(A(x), r(x), 1), 1:numel(A), 'uni', 0);
Anew = vertcat(Anew{:});
end
toc, clear Anew
disp('------------------- With cumsum')
tic
for k1 = 1:num_runs
B(cumsum(r) + 1) = 1;
idx = cumsum(B) + 1;
idx(end) = [];
out1 = A(idx);
end
toc,clear B idx out1
disp('------------------- With bsxfun')
tic
for k1 = 1:num_runs
A1 = A(ones(1,max(r)),:);
out2 = A1(bsxfun(#le,[1:max(r)]',r));
end
toc
Results
------------------- With arrayfun
Elapsed time is 2.198521 seconds.
------------------- With cumsum
Elapsed time is 5.360725 seconds.
------------------- With bsxfun
Elapsed time is 2.896414 seconds.
Benchmarking Code - Case II [Bigger datasize but lesser max of r]
%// Parameters and input data
N = 10000;
max_repeat = 1000;
Results
------------------- With arrayfun
Elapsed time is 2.641980 seconds.
------------------- With cumsum
Elapsed time is 3.426921 seconds.
------------------- With bsxfun
Elapsed time is 1.858007 seconds.
Conclusions from benchmarks
For case I, arrayfun seems like the way to go, while for Case II, bsxfun might be the weapon of choice. So, it seems that the type of data you are dealing with, would really dictate which approach to go with.

Matlab Vector Interval

I'm working with numerical methods and decided to begin learning with the Matlab environment, my question is, how can I add to the generated interval of my vector the last number of the interval, e.g.:
vector = [-2.4:2.4]
this will result in these numbers inside the vector:
-2.4000 -1.4000 -0.4000 0.6000 1.6000
so, I wanna know what are my options so that I can do this:
-2.4000 -1.4000 -0.4000 0.6000 1.6000 2.4000
I need the interval between the numbers to be 1 and I don't know the exact size of the vector, so I can't use linspace function. Before coming here to ask, I've already searched about it, but really didn't find something that could help me.
If the difference between the first element and the last element is not a multpiple of 1 you cannot have the interval between numbers be 1 for all. However, if your goal is to ensure that the last element is some particular number and are willing to compromise (it seems you do with 1.6 and 2.4), how about constructing v like this?
v1 = -2.4; v_last = 2.4;
v = v1 : v_last;
if v(end) ~= v_last
v = [v, v_last];
end

Calculating group mean/medians in MATLAB where group ID is in a separate column

I have one column which contains the group ID of each participant. There are three groups so every number in this column is 1, 2 or 3.
Then I have a second column which contains response scores for each participant. I want to calculate the mean/median response score within each group.
I have managed to do this by looping through every row but I sense this is a slow and suboptimal solution. Could someone please suggest a better way of doing things?
grpstats is a good function to be used ( documentation here )
This is a list of embedded statistics:
'mean' Mean
'sem' Standard error of the mean
'numel' Count, or number, of non-NaN elements
'gname' Group name
'std' Standard deviation
'var' Variance
'min' Minimum
'max' Maximum
'range' Range
'meanci' 95% confidence interval for the mean
'predci' 95% prediction interval for a new observation
and it accepts as well function handles ( Ex: #mean, #skeweness)
>> groups = [1 1 1 2 2 2 3 3 3]';
>> data = [0 0 1 0 1 1 1 1 1]';
>> grpstats(data, groups, {'mean'})
ans =
0.3333
0.6667
1.0000
>> [mea, med] = grpstats(data, groups, {'mean', #median})
mea =
0.3333
0.6667
1.0000
med =
0
1
1
This is a good place to use accumarray (documentation and blog post):
result = accumarray(groupIDs, data, [], #median);
You can of course give a row or column of a matrix instead of a variable called groupIDs and another for data. If you'd prefer the mean instead of the median, use #mean as the 4th arg.
Note: the documentation notes that you should sort the input parameters if you need to rely on the order of the output. I'll leave that exercise for another day though.
Use logic conditions, for example say your data is in matrix m as follows: the first col is ID the second col is the response scores,
mean(m(m(:,1)==1,2))
median(m(m(:,1)==1,2))
will give you the mean and median for 1 in the response score, etc

reformatting a matrix in matlab with nan values

This post follows a previous question regarding the restructuring of a matrix:
re-formatting a matrix in matlab
An additional problem I face is demonstrated by the following example:
depth = [0:1:20]';
data = rand(1,length(depth))';
d = [depth,data];
d = [d;d(1:20,:);d];
Here I would like to alter this matrix so that each column represents a specific depth and each row represents time, so eventually I will have 3 rows (i.e. days) and 21 columns (i.e. measurement at each depth). However, we cannot reshape this because the number of measurements for a given day are not the same i.e. some are missing. This is known by:
dd = sortrows(d,1);
for i = 1:length(depth);
e(i) = length(dd(dd(:,1)==depth(i),:));
end
From 'e' we find that the number of depth is different for different days. How could I insert a nan into the matrix so that each day has the same depth values? I could find the unique depths first by:
unique(d(:,1))
From this, if a depth (from unique) is missing for a given day I would like to insert the depth to the correct position and insert a nan into the respective location in the column of data. How can this be achieved?
You were thinking correctly that unique may come in handy here. You also need the third output argument, which maps the unique depths onto the positions in the original d vector. have a look at this code - comments explain what I do
% find unique depths and their mapping onto the d array
[depths, ~, j] = unique(d(:,1));
% find the start of every day of measurements
% the assumption here is that the depths for each day are in increasing order
days_data = [1; diff(d(:,1))<0];
% count the number of days
ndays = sum(days_data);
% map every entry in d to the correct day
days_data = cumsum(days_data);
% construct the output array full of nans
dd = nan(numel(depths), ndays);
% assing the existing measurements using linear indices
% Where data does not exist, NaN will remain
dd(sub2ind(size(dd), j, days_data)) = d(:,2)
dd =
0.5115 0.5115 0.5115
0.8194 0.8194 0.8194
0.5803 0.5803 0.5803
0.9404 0.9404 0.9404
0.3269 0.3269 0.3269
0.8546 0.8546 0.8546
0.7854 0.7854 0.7854
0.8086 0.8086 0.8086
0.5485 0.5485 0.5485
0.0663 0.0663 0.0663
0.8422 0.8422 0.8422
0.7958 0.7958 0.7958
0.1347 0.1347 0.1347
0.8326 0.8326 0.8326
0.3549 0.3549 0.3549
0.9585 0.9585 0.9585
0.1125 0.1125 0.1125
0.8541 0.8541 0.8541
0.9872 0.9872 0.9872
0.2892 0.2892 0.2892
0.4692 NaN 0.4692
You may want to transpose the matrix.
It's not entirely clear from your question what your data looks like exactly, but the following might help you towards an answer.
Suppose you have a column vector
day1 = 1:21';
and, initially, all the values are NaN
day1(:) = NaN
Suppose next that you have a 2d array of measurements, in which the first column represents depths, and the second the measurements at those depths. For example
msrmnts = [1,2;2,3;4,5;6,7] % etc
then the assignment
day1(msrmnts(:,1)) = msrmnts(:,2)
will set values in only those rows of day1 whose indices are found in the first column of msrmnts. This second statement uses Matlab's capabilities for using one array as a set of indices into another array, for example
d([9 7 8 12 4]) = 1:5
would set elements [9 7 8 12 4] of d to the values 1:5. Note that the indices of the elements do not need to be in order. You could even insert the same value several times into the index array, eg [4 4 5 6 3 4] though it's not terribly useful.