Aligning multiple arrays in a cell array by prepending/postpending NaNs

Aligning multiple arrays in a cell array by prepending/postpending NaNs - matlab

I am trying to align arrays within a cell-array while prepending/postpending NaNs to match the size of arrays like this for example:
%Setting up data
A = [0.01 0.02 0.03 0.01 0.60 0.90 -1.02];
B = [0.03 0.01 0.60 0.90];
C = [0.03 0.01 0.60 0.90 -1.02 0.03 -1.02];
CellABC = {A, B, C};
The expected output is this:
CellABC = {[0.01 0.02 0.03 0.01 0.60 0.90 -1.02 NaN NaN ],...
NaN NaN 0.03 0.01 0.60 0.90 NaN NaN NaN ],...
NaN NaN 0.03 0.01 0.60 0.90 -1.02 0.03 -1.02]};
This is just an example. In my actual data, I have a 1x100 cell-array containing arrays of sizes ranging from 1x400 to 1x1400.
I have tried this:
[~, idx] = max(cellfun(#numel, CellABC)); %index of maximum no. of entries in CellABC
for i=1:length(CellABC)
[d1, d2] = findsignal(CellABC{idx},CellABC{i},'Metric','absolute');
tmp = NaN(size(CellABC{idx})); %initializing with NaNs
tmp(d1:d2) = CellABC{i}; %saving the array as per indices of found values
CellABC{i} = tmp; %Updating the cell array
end
This will align the CellABC{2} correctly but the number of postpended NaNs is not correct. Also that does not give postpended NaNs at the end of CellABC{1} and prepended NaNs at the start of CellABC{3}. I understand the reason that findsignal function is not useful in this case since we don't have an array with the complete data to be use as the first input argument of findsignal. How could I make this work?
I have also looked into alignsignals function but it is only for two signals. I am unable to figure out how this could be implemented for 100 signals as in my case.
How could this problem be solved?

Its relatively simple for the example data, but you may need more than one template in multiple loops if the real data is too fragmented.
A = [0.01 0.02 0.03 0.01 0.60 0.90 -1.02];
B = [0.03 0.01 0.60 0.90];
C = [0.03 0.01 0.60 0.90 -1.02 0.03 -1.02];
CellABC = {A, B, C};
% find longest anyway
[~,I]=max(cellfun(#(x) numel(x),CellABC));
% find lags and align in two pass
% 1st pass
lags=zeros(numel(CellABC),1);
for idx=1:numel(CellABC)
if idx==I, continue; end
[r,lag]=xcorr(CellABC{I},CellABC{idx});
[~,lagId]=max(r);
lags(idx)=lag(lagId);
end
% 2nd pass
out=nan(numel(CellABC),max(arrayfun(#(x) numel(CellABC{x})+lags(x),1:numel(CellABC))));
for idx=1:numel(CellABC)
out(idx,lags(idx)+1:lags(idx)+numel(CellABC{idx}))=CellABC{idx};
end
out =
0.0100 0.0200 0.0300 0.0100 0.6000 0.9000 -1.0200 NaN NaN
NaN NaN 0.0300 0.0100 0.6000 0.9000 NaN NaN NaN
NaN NaN 0.0300 0.0100 0.6000 0.9000 -1.0200 0.0300 -1.0200

Related

Inserting a struct into a vector?

I have a 175x1 vector of probabilities, v, and a struct with a vector in it, called data.x, which is 8156x1 and has numbers from 0-400.
In code provided to me, they do the following:
v(data.x);
and out comes a vector of 8156x1. I have no idea what it does to the data, and have not been able to recreate the result.
Any help is appreciated.

Looks like your data.x is a vector of indexes for your v vector. I am surprised that data.x has values between 0-400, it will result in error for any value greater than 175 (length of the vector v).
For example this:
v = [0.4 0.2 0.1 0.44 0.25 0.9 0.91]';
data.x = [1 3 2 5 2]';
v(data.x)
ans =
0.4000
0.1000
0.2000
0.2500
0.2000

How to exclude NaNs from ranking a vector

We're working on a MATLAB code to rank stocks. We do not have a full dataset and therefore have to cope with some NaNs. However, in the code we use for sorting, the NaNs are ranked the highest. Our intention is to exclude the NaNs from the ranking. How to do this?
Please consider an example with Y and stockkid below
Y = [1.2 1.3 NaN 0.9 0.95 NaN 0.8 0.7];
stockid = [801 802 803 804 805 806 807 808];
[totalmonths,totalstocks] = size(Y);
nbrstocks = totalstocks - sum(isnan(Y));
[B,I] = sort(Y,'descend');
ncandidates = 4;
idwinner(1:ncandidates) = stockid(I(1:ncandidates));
Running the program results in:
Y =
1.2000 1.3000 NaN 0.9000 0.9500 NaN 0.8000 0.7000
idwinner =
803 806 802 801
So, 803 corresponds to NaN, 806 to NaN, 802 to 1.3 etc.
The result we're aiming for should be like this:
Y =
1.2000 1.3000 NaN 0.9000 0.9500 NaN 0.8000 0.7000
idwinner =
802 801 805 804
So, how can we exclude the NaNs from the ranking?

Use
Y(isnan(Y)) = -inf;
before calling sort. That will change NaN values into -inf, and thus those values will be the lowest.
Alternatively, if you don't want to change any value in Y, you can use an intermediate index as follows:
Y = [1.2 1.3 NaN 0.9 0.95 NaN 0.8 0.7];
stockid = [801 802 803 804 805 806 807 808];
ind = find(~isnan(Y)); %/ intermediate index that tells which elements are numbers
[B,I] = sort(Y(ind),'descend');
ncandidates = 4;
idwinner(1:ncandidates) = stockid(ind(I(1:ncandidates))); %// apply intermediate index

After your sort statement, add the line: I = I(~isnan(B));, which will remove the indices associated with NaNs before you select them from stockids

I = I(~isnan(B));
Works best since we then do not overwrite the NaNs as is the case with using
Y(isnan(Y)) = -inf;
Since we later on also have to determine the loser portfolios from the stocks with the lowest returs. This does not work well with the last code because all the NaNs have the lowest returns instead of the stocks with actual data.

Assigning 0 value to missing element: MATLAB

I have two set of matrices A and B as below:
A = [NaN NaN NaN 0.61 NaN 0.6
NaN 2.14 NaN 0.57 NaN 0.83
NaN 5.11 NaN 2.45 NaN 2.35
NaN 10.93 NaN 5.58 6.13 5.95];
B = [0.76 2.24 1.89 0.61 -0.46 0.6
1.30 2.14 2.93 0.57 0.65 0.83
2.29 5.11 4.88 2.45 1.71 2.35
6.65 10.93 9.39 5.58 6.13 5.95]
The matrix B contains imputed values from matrix A. I need to find out the element which was imputed corresponding to matrix A and if it is negative, put a value of 0 for that element. For example, the element at (2,5) has a value of -0.46, which was NaN in original matrix A. So for this element I need to assign 0 value in matrix B.

B(isnan(A) & (B < 0)) = 0;

How to Replace Values Exceeding Threshold with Random Number Sampled from a Given Dataset?

I have a 3-dimensional vector called 'simulatedReturnsEVT3'. In that vector, I would like to replace all values that are higher than 'MaxAcceptableVal' or lower than 'MinAcceptableVal'. Such values that are beyond either of these two thresholds should be replaced by a random number that is drawn from the 3-dimensional vector 'data2'. For drawing that random number, I use the matlab function 'datasample'.
I have written the below code, which replaces the values that are beyond either of the thresholds with a random number sampled from 'data2'. However, it seems (when plotting the data in a histogram) that the replacement happens with the same value along dimension 'j'. This is not what I want to do. For every threshold exceedance, I want a new random number to be drawn for replacement from 'data2'.
nIndices = 19
nTrials = 10000
% data2 has dimensions 782 x 19 x 10000
% simulatedReturnsEVT3 has dimensions 312 x 19 x 10000
% MaxAcceptableVal has dimensions 1 x 19
% MinAcceptableVal has dimensions 1 x 19
% Cut off Outliers
for i=1:nIndices
for j=1:nTrials
sliceEVT = simulatedReturnsEVT3(:,i,j);
sliceEVT(sliceEVT < MinAcceptableVal(i))=datasample (data2(:,i,j), 1,1,'Replace',false);
sliceEVT(sliceEVT > MaxAcceptableVal(i))=datasample (data2(:,i,j), 1,1,'Replace',false);
simulatedReturnsEVT3(:,i,j) = sliceEVT;
end
end
The same problem can be illustrated on a smaller scale by creating the following matrices.
% Set Maximum Acceptable Levels for Positive and Negative Returns
MaxAcceptableVal = [0.5 0.3]
MinAcceptableVal = [-0.5 -0.3]
simulatedReturnsEVT3 = [0.6 0.3; 0.3 0.3; 0.3 0.3; 0.3 0.4]
simulatedReturnsEVT3 = repmat(simulatedReturnsEVT3,[1 1 2])
data2 = [0.25 0.15; 0.25 0.15; 0.2 0.1]
data2 = repmat(data2,[1 1 2])
% Cut off Outliers
for i=1:2
for j=1:2
sliceEVT = simulatedReturnsEVT3(:,i,j);
sliceEVT(sliceEVT < MinAcceptableVal(i))=datasample (data2(:,i,j), 1,1,'Replace',false);
sliceEVT(sliceEVT > MaxAcceptableVal(i))=datasample (data2(:,i,j), 1,1,'Replace',false);
simulatedReturnsEVT3(:,i,j) = sliceEVT;
end
end
Can anybody help?

If I've understood the problem, it seems it is related to the usage of datasample.
In your code you use:
datasample (data2(:,i,j), 1,1,'Replace',false);
in this call, the first "1" defines the number of sample to be extracted that is "1".
In case more than one values have to be replaced in the simulatedReturnsEVT3 matrix, all of them wil be replaced by the same, unique number extracted using datasample
Again, if I've understood the problem, you should call datasample by specifying the number "n" of values are needed to replace the "out of the bound" values in simulatedReturnsEVT3
datasample (data2(:,i,:), n,1,'Replace',false)
To test this solution I've modified the definition of MaxAcceptableVal in order to have "more" values "out of the bound" in simulatedReturnsEVT3:
MaxAcceptableVal = [0.5 0.2]
These are the values of simulatedReturnsEVT3 before the replacement:
val(:,:,1) =
0.6000 0.3000
0.3000 0.3000
0.3000 0.3000
0.3000 0.4000
val(:,:,2) =
0.6000 0.3000
0.3000 0.3000
0.3000 0.3000
0.3000 0.4000
These are the values after the replacement:
val(:,:,1) =
0.2500 0.1500
0.3000 0.1000
0.3000 0.1500
0.3000 0.1000
val(:,:,2) =
0.2000 0.1000
0.3000 0.1500
0.3000 0.1500
0.3000 0.1000
This is the updated code:
% Set Maximum Acceptable Levels for Positive and Negative Returns
% MaxAcceptableVal = [0.5 0.3]
MaxAcceptableVal = [0.5 0.2]
MinAcceptableVal = [-0.5 -0.3]
simulatedReturnsEVT3 = [0.6 0.3; 0.3 0.3; 0.3 0.3; 0.3 0.4]
simulatedReturnsEVT3 = repmat(simulatedReturnsEVT3,[1 1 2])
data2 = [0.2 0.1; 0.25 0.15; 0.25 0.15; 0.2 0.1]
data2 = repmat(data2,[1 1 2])
% Cut off Outliers
for i=1:2
for j=1:2
sliceEVT = simulatedReturnsEVT3(:,i,j)
% Identify the index of the values to be replaced
idx=find(sliceEVT < MinAcceptableVal(i))
% Evaluate how many values have to be replaced
n=length(idx)
% Extract and assign the number from "data2"
sliceEVT(idx)=datasample (data2(:,i,j), n,1,'Replace',false)
% Identify the index of the values to be replaced
idx=find(sliceEVT > MaxAcceptableVal(i))
% Evaluate how many values have to be replaced
n=length(idx)
% Extract and assign the number from "data2"
sliceEVT(idx)=datasample (data2(:,i,j), n,1,'Replace',false)
simulatedReturnsEVT3(:,i,j) = sliceEVT
end
end
Hope this helps.

MATLAB remove NaN values from matrix and shift values left

I am trying to compute column-wise differences in the following matrix:
A =
0 NaN NaN 0.3750 NaN
NaN 0.1250 0.2500 0.3750 NaN
I would like to obtain:
0.3750 NaN NaN
0.1250 0.1250 0.1250
Where I am essentially taking a columnwise difference, skipping NaN values and shifting values to the left.
A one-dimensional case would be:
A = [0 NaN 0.250 0.375 NaN 0.625];
NaN_diff(A) = [0.250 0.125 0.250];
Any way to do this efficiently in MATLAB without using inefficient find() queries per row?

Here's a solution that vectorizes most of the operations:
notNan = ~isnan(A);
numNN = sum(notNan,2);
shifted = NaN(size(A));
for r = 1:size(A,1)
myRow = A(r,:);
shifted(r,1:numNN(r)) = myRow(notNan(r,:));
end
nanDiff = diff(shifted,1,2);

Here is an alternative vectorized solution:
%// Convert to cell array without NaNs
[rows, cols] = size(A);
C = cellfun(#(x)x(~isnan(x)), mat2cell(A, ones(1, rows), cols), 'Uniform', 0);
%// Compute diff for each row and pad
N = max(sum(~isnan(A), 2));
C = cellfun(#(x)[diff(x) nan(1, N - length(x))], C, 'Uniform', 0);
%// Convert back to a matrix
nandiff = vertcat(C{:});
If you want to pad the result matrix with zeroes instead of NaN values, change the nan function call in nan(1, N - length(x)) to zeros.

Here is an alternative method that does require you to loop over each row, but should still have decent performance and feels very intuitive to me.
B = NaN(size(A,1),size(A,2)-1)
for i = 1:size(A,1)
idx = ~isnan(A(:,i))
B(i,1:sum(idx)) = diff(A(i,idx))
end

I'm aware that this is a rather old question, but for people like me who stumble into this page, here is a simpler (imho) solution to the question:
A = [0 NaN 0.250 0.375 NaN 0.625];
A(isnan(A))=[]; % identify index of NaN values and remove them from the array
B = diff(A);

Here is another simple solution without using a loop [but assuming all values are in ascending order]:
A=[0 NaN NaN 0.3750 NaN;NaN 0.1250 0.2500 0.3750 NaN]
A(isnan(A(:,1)))=0;
B=sort(A,2);
C=diff(B,1,2)