This is easy in two dimensions, for example:
>> A = NaN(5,4)
>> A(2:4,2:3) = [1 2; 3 4; 5 6]
>> A(2,2) = NaN
>> A(4,3) = NaN
A =
NaN NaN NaN NaN
NaN NaN 2 NaN
NaN 3 4 NaN
NaN 5 NaN NaN
NaN NaN NaN NaN
>> A(~all(isnan(A),2),~all(isnan(A),1))
ans =
NaN 2
3 4
5 NaN
Note that NaN values in rows and columns that are not all NaN are retained.
How to expand this to multiple dimensions? For example if A has three dimensions:
>> A = NaN(5,4,3)
>> A(2:4,2:3,2) = [1 2; 3 4; 5 6]
>> A(2,2,2) = NaN
>> A(4,3,2) = NaN
A(:,:,1) =
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
A(:,:,2) =
NaN NaN NaN NaN
NaN NaN 2 NaN
NaN 3 4 NaN
NaN 5 NaN NaN
NaN NaN NaN NaN
A(:,:,3) =
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
How do I then get
ans =
NaN 2
3 4
5 NaN
I'd like to do this in four dimensions, and with much larger matrixes than the example matrix A here.
My solution to the problem based on the input A as posted by OP:
>> [i,j,k] = ind2sub(size(A),find(~isnan(A)));
>> l = min([i j k]);
>> u = max([i j k]);
>> B=A(l(1):u(1),l(2):u(2),l(3):u(3))
B =
NaN 2
3 4
5 NaN
>> size(B)
ans =
3 2
Since you stated that you want to do this on much larger matrices I'm not sure about the performance of #ronalchn's solution - that is all the all-calls. But I have no idea to what extend that matters - maybe someone can comment...
Try this:
2 dimensions
A(~all(isnan(A),2),~all(isnan(A),1))
3 dimensions
A(~all(all(isnan(A),2),3),...
~all(all(isnan(A),1),3),...
~all(all(isnan(A),1),2))
4 dimensions
A(~all(all(all(isnan(A),2),3),4),...
~all(all(all(isnan(A),1),3),4),...
~all(all(all(isnan(A),1),2),4),...
~all(all(all(isnan(A),1),2),3))
Basically, the rule is for N dimensions:
on all N dimensions you do the isnan() thing.
Then wrap it in with the all() function N-1 times,
and the 2nd argument each of the all() functions for the ith dimension should be numbers 1 to N in any order, but excluding i.
Since Theodros Zelleke wants to see whose method is faster (nice way of saying he thinks his method is so fast), here's a benchmark. Matrix A defined as:
A = NaN*ones(100,400,3,3);
A(2:4,2:3,2,2) = [1 2; 3 4; 5 6];
A(2,2,2,2) = NaN;A(4,3,2,2) = NaN;
A(5:80,4:200,2,2)=ones(76,197);
His test defined as:
tic;
for i=1:100
[i,j,k,z] = ind2sub(size(A),find(~isnan(A)));
l = min([i j k z]);
u = max([i j k z]);
B=A(l(1):u(1),l(2):u(2),l(3):u(3),l(4):u(4));
end
toc
With results:
Elapsed time is 0.533932 seconds.
Elapsed time is 0.519216 seconds.
Elapsed time is 0.575037 seconds.
Elapsed time is 0.525000 seconds.
My test defined as:
tic;
for i=1:100
isnanA=isnan(A);
ai34=all(all(isnanA,3),4);
ai12=all(all(isnanA,1),2);
B=A(~all(ai34,2),~all(ai34,1),~all(ai12,4),~all(ai12,3));
end
toc
With results:
Elapsed time is 0.224869 seconds.
Elapsed time is 0.225132 seconds.
Elapsed time is 0.246762 seconds.
Elapsed time is 0.236989 seconds.
Related
I am looking for algorithm (effective + vectorized) how to find histogram of gaps (NaN) width in the following manner:
signals are represented by (Nsamples x Nsig) array
gaps in signal are encoded by NaN's
width of gaps: is number of consecutive NaN's in the signal
gaps width histogram: is frequency of gaps with specific widths in signals
And the following conditions are fullfilled:
[Nsamples,Nsig ]= size(signals)
isequal(size(signals),size(gapwidthhist)) % true
isequal(sum(gapwidthhist.*(1:Nsamples)',1),sum(isnan(signals),1)) % true
Of course, compressed form of gapwidthhist (represented by two cells: "gapwidthhist_compressed_widths" and "gapwidthhist_compressed_freqs") is required too.
Example:
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN]' % signal No. 2
gapwidthhist = [1 1 1 1 0 0 0 0 0 0 0 0 0 0; % gap histogram for signal No. 1
3 1 0 0 1 0 0 0 0 0 0 0 0 0]' % gap histogram for signal No. 2
where integer histogram bins (gap widths) are 1:Nsamples (Nsamples=14).
Coresponding compressed gap histogram looks like:
gapwidthhist_compressed_widths = cell(1,Nsig)
gapwidthhist_compressed_widths =
1×2 cell array
{[1 2 3 4]} {[1 2 5]}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
gapwidthhist_compressed_freqs = cell(1, Nsig)
gapwidthhist_compressed_freqs =
1×2 cell array
{[1 1 1 1]} {[3 1 1]}
Typical problem dimension:
Nsamples = 1e5 - 1e6
Nsig = 1e2 - 1e3.
Thanks in advance for any help.
Added remark: My so far best solution is the following code:
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]' % signal No. 3
[numData, numSignals] = size(signals)
gapwidthhist = zeros(numData, numSignals);
for column = 1 : numSignals
thisSignal = signals(:, column); % Extract this column.
% Find lengths of all NAN runs
props = regionprops(isnan(thisSignal), 'Area');
allLengths = [props.Area]
edges = [1:max(allLengths), inf]
hc = histcounts(allLengths, edges)
% Load up gapwidthhist
for k2 = 1 : length(hc)
gapwidthhist(k2, column) = hc(k2);
end
end
% What it is:
gapwidthhist'
But I am looking mainly for pure Matlab code without any built-in matlab functions (like "regionprops" from Image Processing Toolbox)!!!
This is much more simple Matlab implementation but still not optimal (+ not vectorized):
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]'; % signal No. 3
signals
[numData, numSignals] = size(signals);
gapwidthhist = zeros(numData, numSignals);
gaps = zeros(numData+1,numSignals);
auxnan = isnan(signals);
for i = 1:numSignals
c = 0;
for j = 1:numData
if auxnan(j,i)
c = c + 1;
else
gaps(j,i) = c;
c = 0;
end
end
gaps(numData+1,i) = c;
gapwidthhist(:,i) = histcounts(gaps(:,i),1:numData+1);
end
gapwidthhist
Thanks to #breaker for help.
Any idea how to optimize (vectorize) this code to be more effective?
Here is a slightly more vectorized version that may be a bit quicker. I use Octave, so I don't know how much MATLAB's JIT compiler will optimize the inner loop in the other approach.
% Set up the data
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]'; % signal No. 3
signals
[numData, numSignals] = size(signals);
gapwidthhist = zeros(numData, numSignals);
gaps = zeros(numData+1,numSignals);
auxnan = ~isnan(signals); % We want non-NaN values to be 1
for i = 1:numSignals
difflist = diff(find([1; auxnan(:,i); 1])) - 1; % get the gap lengths
gapList = difflist(find(difflist)); % keep only the non-zero gaps
for c = gapList.' % need row vector to loop over elements
gapwidthhist(c,i) = gapwidthhist(c,i) + 1; % each gap length increments the histogram
end
end
gapwidthhist
Here's the program flow:
First, negate the auxnan array so that NaN is 0 and non-NaN is 1.
In the outer loop, pad each column with 1's on top and bottom to capture strings of NaN at the beginning and end of the signal.
Use find to get the indices of the 1 (non-NaN) elements.
Take the diff of the indices.
A diff of 1 means no gap and a diff greater than 1 gives the length of the gap plus 1, so subtract 1 from the diff result.
Use the results (indices) of find to get the values of the nonzero elements. These are the gap widths.
Now loop through the values and accumulate the results in the histogram. You might try replacing this inner loop with accumarray to see if that speeds things up any.
May be final solution:
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]'; % signal No. 3
signals
[numData, numSignals] = size(signals);
gapwidthhist = zeros(numData, numSignals);
auxnan = isnan(signals);
for i = 1:numSignals
c = 0;
for j = 1:numData
if auxnan(j,i)
c = c + 1;
else
if c > 0
gapwidthhist(c,i) = gapwidthhist(c,i) + 1;
c = 0;
end
end
end
if c > 0
gapwidthhist(c,i) = gapwidthhist(c,i) + 1;
end
end
gapwidthhist
Open question: how to modify the code where outer for-loop should be able to use parfor-loop?
Knowing that:
There are a lot of discussion about plotting equal sized matrices in a cell array and it is quite easy to do without a loop.
For example, to plot the 2-by-2 matrices in mycell:
mycell = {[1 1; 2 1], [1 1; 3 1], [1 1; 4 1]};
We can use cellfun to add a row of NaN at the bottom of each matrix and then convert the cell to a matrix:
mycellnaned = cellfun(#(x) {[x;nan(1,2)]}, mycell);
mymat = cell2mat(mycellnaned');
mymat looks like:
1 1 1 1 1
2 1 3 1 4
NaN NaN NaN NaN NaN
Then we can plot it easily:
mymatx = mymat(:,1:2:end);
mymaty = mymat(:,2:2:end);
figure;
plot(mymatx, mymaty,'+-');
The problem:
The problem is now, how do I do something similar with a cell containing non-equal matrices? Such as:
mycell = {
[1:2; ones(1,2)]';
[1:4; ones(1,4)*2]';
[1:6; ones(1,6)*3]';
[1:8; ones(1,8)*4]';
[1:10; ones(1,10)*5]';
[1:12; ones(1,12)*6]';
};
mycell = repmat(mycell,1000,1);
I would not be able to convert them into one matrix like I did before. I could use a loop, as suggested in this answer, but it would be very inefficient if the cell contains thousands of matrices.
Therefore, I'm looking for a more efficient way of plotting non-equal sized matrices in a cell array.
Note that different colours should be used for different matrices in the figure.
Well, while I was writing the question, I figured it out...
I'd like to keep the question open since there might be better solutions.
For everyone else's reference, the solution is simple: add NaN to make the matrices equal sized:
% find out the maximum length of all matrices in the array
cellLengthMax = max(cellfun('length', mycell));
% fill the matrices so they are equal in size.
mycellfilled = cellfun(#(x) {[
x
nan(cellLengthMax-size(x,1), 2)
nan(1, 2)
]}, mycell);
Then convert to a matrix and plot:
mymat = cell2mat(mycellfilled');
mymatx = mymat(:,1:2:end);
mymaty = mymat(:,2:2:end);
figure;
plot(mymatx, mymaty,'+-');
mymat looks like:
1 1 1 2 1 3 1 4 1 5 1 6
2 1 2 2 2 3 2 4 2 5 2 6
NaN NaN 3 2 3 3 3 4 3 5 3 6
NaN NaN 4 2 4 3 4 4 4 5 4 6
NaN NaN NaN NaN 5 3 5 4 5 5 5 6
NaN NaN NaN NaN 6 3 6 4 6 5 6 6
NaN NaN NaN NaN NaN NaN 7 4 7 5 7 6
NaN NaN NaN NaN NaN NaN 8 4 8 5 8 6
NaN NaN NaN NaN NaN NaN NaN NaN 9 5 9 6
NaN NaN NaN NaN NaN NaN NaN NaN 10 5 10 6
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 11 6
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 12 6
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Update:
Time cost for plotting 6000 matrices:
using the solution proposed here: 1.183546 seconds.
using a loop: 3.450423 seconds.
Still not very satisfactory. I really wish to reduce the time to 0.1 seconds, because I'm trying to design an interactive UI, where the user can change a few parameters and the result get plotted instantly.
I don't want to reduce the resolution of the figure.
Update:
I did a profiler and it seems the 99% of the time is wasted on plot(mymatx, mymaty,'+-');. So the conclusion is, there is probably no other way to fasten this.
I have a time stamp as follow.
Time =
243.0000
243.0069
243.0139
243.0208
243.0278
243.0347
243.0417
243.0486
243.0556
243.0625
243.0694
243.0764
243.0833
243.0903
243.0972
243.1042
243.1111
243.1181
243.1250
243.1319
243.1389
243.1458
243.1528
243.1597
243.1667
243.1736
243.1806
243.1875
243.1944
Now I have another two column vector.
ab =
243.0300 0.5814
243.0717 0.6405
243.1134 0.6000
243.1550 0.5848
243.1967 0.5869
First column is 'Time2' and second column is 'Conc'.
Time2 = ab(:,1);
Conc = ab(:,2);
Now I want to match 'Conc' based on 'Time2' with 'Time' but only filling with 'NaN'. Also 'Time2' is not exactly as 'Time'. I can use something like following
Conc_interpolated = interp1(Time2,Conc,Time)
but it does an interpolation with artificial data. I only want to match vector length by filling with 'NaN' in 'Conc' not with interpolated data.Any recommendations? Thanks
I try to guess what you want:
you have time vector A:
TimeA = ...
[243.0000;
243.0069;
...
243.1875;
243.1944];
and probably some data A:
DataA = rand(length(TimeA),1);
now you want to implement your second time vector B:
TimeB = ...
[243.0300;
243.0717;
243.1134;
243.1550;
243.1967];
and the according data:
DataB = ...
[0.5814;
0.6405;
0.6000;
0.5848;
0.5869];
finally everything should be merged together and sorted:
X = [ TimeA, DataA , NaN(size(DataA)) ;
TimeB, NaN(size(DataB)) , DataB ]
Y = sortrows(X,1);
results to:
Y =
243.0000 0.8852 NaN
243.0069 0.9133 NaN
243.0139 0.7962 NaN
243.0208 0.0987 NaN
243.0278 0.2619 NaN
243.0300 NaN 0.5814
243.0347 0.3354 NaN
243.0417 0.6797 NaN
243.0486 0.1366 NaN
243.0556 0.7212 NaN
243.0625 0.1068 NaN
243.0694 0.6538 NaN
243.0717 NaN 0.6405
243.0764 0.4942 NaN
243.0833 0.7791 NaN
243.0903 0.7150 NaN
243.0972 0.9037 NaN
243.1042 0.8909 NaN
243.1111 0.3342 NaN
243.1134 NaN 0.6000
243.1181 0.6987 NaN
243.1250 0.1978 NaN
243.1319 0.0305 NaN
243.1389 0.7441 NaN
243.1458 0.5000 NaN
243.1528 0.4799 NaN
243.1550 NaN 0.5848
243.1597 0.9047 NaN
243.1667 0.6099 NaN
243.1736 0.6177 NaN
243.1806 0.8594 NaN
243.1875 0.8055 NaN
243.1944 0.5767 NaN
243.1967 NaN 0.5869
is that right?
My understanding is a little different, it doesn't add to Time but rather assigns each Conc to the nearst Time based on it's Time2:
ind = zeros(size(ab,1),1); %//preallocate memory
for ii = 1:size(ab,1)
[~, ind(ii)] = min(abs(ab(ii,1)-Time)); %//Based on this FEX entry: http://www.mathworks.com/matlabcentral/fileexchange/30029-findnearest-algorithm/content/findNearest.m
end
Time(:,2) = NaN; %// Prefill with NaN
Time(ind, 2) = ab(:,2)
This results in:
Time =
243.00000 NaN
243.00690 NaN
243.01390 NaN
243.02080 NaN
243.02780 0.58140
243.03470 NaN
243.04170 NaN
243.04860 NaN
243.05560 NaN
243.06250 NaN
243.06940 0.64050
243.07640 NaN
243.08330 NaN
243.09030 NaN
243.09720 NaN
243.10420 NaN
243.11110 0.60000
243.11810 NaN
243.12500 NaN
243.13190 NaN
243.13890 NaN
243.14580 NaN
243.15280 0.58480
243.15970 NaN
243.16670 NaN
243.17360 NaN
243.18060 NaN
243.18750 NaN
243.19440 0.58690
for your example inputs
I'd like to replace all the NaNs in a vector with the last previous non-NaN value
input = [1 2 3 NaN NaN 2];
output = [1 2 3 3 3 2];
i'd like to try and speed up the loop I already have
input = [1 2 3 NaN NaN 2];
if isnan(input(1))
input(1) = 0;
end
for i= 2:numel(input)
if isnan(input(i))
input(i) = input(i-1);
end
end
thanks in advance
Since you want the previous non-NaN value, I'll assume that the first value must be a number.
while(any(isnan(input)))
input(isnan(input)) = input(find(isnan(input))-1);
end
I profiled dylan's solution, Oleg's solution, and mine on a 47.7 million long vector. The times were 12.3s for dylan, 3.7 for Oleg, and 1.9 for mine.
Here a commented solution, works for a vector only but might be enxtended to work on a matrix:
A = [NaN NaN 1 2 3 NaN NaN 2 NaN NaN NaN 3 NaN 5 NaN NaN];
% start/end positions of NaN sequences
sten = diff([0 isnan(A) 0]);
B = [NaN A];
% replace with previous non NaN
B(sten == -1) = B(sten == 1);
% Trim first value (previously padded)
B = B(2:end);
Comparison
A: NaN NaN 1 2 3 NaN NaN 2 NaN NaN NaN 3 NaN 5 NaN NaN
B: NaN NaN 1 2 3 NaN 3 2 NaN NaN 2 3 3 5 NaN 5
Not fully vectorized but quite simple and probably still fairly efficient:
x = [1 2 3 NaN NaN 2];
for f = find(isnan(x))
x(f)=x(f-1);
end
Of course this is only slightly different than the solution provided by #Hugh Nolan
nan_ind = find(isnan(A)==1);
A(nan_ind) = A(nan_ind-1);
I have a file with a huge amount of data. Places where there is no information about prices are marked as NaN. I would like to delete all rows, where there are such names and delete all columns where there are a lot of missing data (because I need then proportional matrix).
I also have another string (AssetList) where there is information about all tickers. If column will be deleted, it’s necessary to delete according ticker there.
I would much appreciate any help.
Data:
6,41 16,51 x x 69,78
6,22 16 x x 68,48
6,17 15,61 x x 69,46
x x x x x
x x x x x
x x x x x
5,83 15,14 x x 69,85
6,4 17,64 x x 71,03
6,07 16,04 x x 68,64
5,91 17,09 x x 68,92
6 18,19 x x 68,72
x x x x x
x x x x x
5,58 17,17 x x 69,02
5,3 16,83 x x 67,69
5,66 19,65 x x 68,64
5,65 20,86 x x 69,45
5,43 20,46 x x 68,94
x x x x x
x x x x x
5,58 2 0,16 x 68,73
AssetList:
FLWS SRCE FUBC DDD MMM
I'll have to make some assumptions here, as I didn't fully understand your question.
The following first deletes all rows that exist of NaN exclusively, and continues by deleting all columns that contain at least one NaN:
M = [ ...
6.41 16.51 NaN NaN 69.78
6.22 16 NaN NaN 68.48
6.17 15.61 NaN NaN 69.46
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
5.83 15.14 NaN NaN 69.85
6.4 17.64 NaN NaN 71.03
6.07 16.04 NaN NaN 68.64
5.91 17.09 NaN NaN 68.92
6 18.19 NaN NaN 68.72
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
5.58 17.17 NaN NaN 69.02
5.3 16.83 NaN NaN 67.69
5.66 19.65 NaN NaN 68.64
5.65 20.86 NaN NaN 69.45
5.43 20.46 NaN NaN 68.94
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
5.58 2 0.16 NaN 68.73];
AssetList = {
'FLWS' 'SRCE' 'FUBC' 'DDD' 'MMM' };
% Delete all-NaN rows
M(all(isnan(M),2),:) = [];
% Delete any-NaN columns
colsToBeDeleted = any(isnan(M));
M(:, colsToBeDeleted) = []
AssetList(colsToBeDeleted) = []
Result:
M =
6.4100 16.5100 69.7800
6.2200 16.0000 68.4800
6.1700 15.6100 69.4600
5.8300 15.1400 69.8500
6.4000 17.6400 71.0300
6.0700 16.0400 68.6400
5.9100 17.0900 68.9200
6.0000 18.1900 68.7200
5.5800 17.1700 69.0200
5.3000 16.8300 67.6900
5.6600 19.6500 68.6400
5.6500 20.8600 69.4500
5.4300 20.4600 68.9400
5.5800 2.0000 68.7300
AssetList =
'FLWS' 'SRCE' 'MMM'