Question: I am dealing with a patient data, where parameters are recorded at different sampling frequency, so having a different time stamp.
I want to create a matrix where data is interpolated by "Last known value" till the new original value changes in the time. So at the end I have uniform matrix where each parameter have values at every time stamp.
Data is in following format:
Time Hear Rate(Variable)
18:00:00 PM 74
18:02:00 PM 75
18:04:00 PM 85
18:06:00 PM 71
18:08:00 PM 79
18:10:00 PM 72
Time Blood Press. (Variable)
18:01:00 PM 100
18:05:00 PM 120
18:09:00 PM 121
Target :
Time Hear Rate(Variable) Blood Press.
18:00:00 PM 74 NaN
18:01:00 PM 74 100
18:02:00 PM 75 100
18:03:00 PM 75 100
18:04:00 PM 85 100
18:05:00 PM 85 120
18:06:00 PM 71 120
18:07:00 PM 71 120
18:08:00 PM 79 120
18:09:00 PM 79 121
18:10:00 PM 72 121
The Interpolated data in the missing place should be the previous value of the known event and should remain same until next change occurs.
I am currently referring the following thread from MATLAB User Forum
http://www.mathworks.com/matlabcentral/answers/101237-how-can-i-do-1-d-interpolation-with-interp1-to-find-the-nearest-value-to-the-left-of-the-point-i-e
You're essentially trying to do a zero-order hold as you've seen from the link you provided.
Let's call the first set of times t_hr (a cell array of strings) and the second set t_bp (also a cell array of strings). Then call the heart rates hr and the blood pressures bp.
Build a new cell array t by combining t_hr and t_bp:
t = [t_hr; t_bp];
Fill in hr and bp with some NaNs to make them the same length as t. In t, the first part of the vector corresponds to times we know about for hr, and the second half corresponds to bp. Use that knowledge accordingly:
hr = [hr; nan(length(t) - length(hr),1)];
bp = [nan(length(t) - length(bp),1); bp];
Now you've got three vectors: t (okay, that's actually a cell array); hr and bp, and the elements of hr and bp correspond to elements in t, including the NaNs:
t =
'18:00:00 PM'
'18:02:00 PM'
'18:04:00 PM'
'18:06:00 PM'
'18:08:00 PM'
'18:10:00 PM'
'18:01:00 PM'
'18:05:00 PM'
'18:09:00 PM'
hr =
74
75
85
71
79
72
NaN
NaN
NaN
bp =
NaN
NaN
NaN
NaN
NaN
NaN
100
120
121
Now, we can sort t:
[t_sorted, idx] = sort(t);
idx contains the indices of t that got moved around to form t_sorted. In this case, idx == [1 7 2 3 8 4 5 9 6].'. We can use that to sort bp and hr:
hr_sorted = hr(idx)
hr_sorted =
74
NaN
75
85
NaN
71
79
NaN
72
bp_sorted = bp(idx)
bp_sorted =
NaN
100
NaN
NaN
120
NaN
NaN
121
NaN
Then, apply the zero-order hold:
for ii = 2:length(t)
if isnan(hr_sorted(ii))
hr_sorted(ii) = hr_sorted(ii-1);
end
if isnan(bp_sorted(ii))
bp_sorted(ii) = bp_sorted(ii-1);
end
end
and your final vectors become:
hr_sorted =
74
74
75
85
85
71
79
79
72
bp_sorted =
NaN
100
100
100
120
120
120
121
121
Note that your target answer has 11 different times in it, but you only supplied nine. (It's missing 18:03 and 18:07.) You can easily extend this answer by adding extra NaNs to hr and bp, and making t a cell array comprising of, say, t_hr, t_bp, and a cell array t_missing ={'18:03:00 PM', '18:07:00 PM'}.
Related
I am a beginner with MATLAB and I am struggling with this assignment. Can anyone guide me through it?
Consider the data given below:
x = [ 1 , 48 , 81 , 2 , 10 , 25 , ,14 , 18 , 53 , 41, 56, 89,0, 1000, , ...
34, 47, 455, 21, , 22, 100 ];
Once the data is loaded, see if you can find any:
Outliers or
Missing data in the data file
Correct the missing values using median, mode and noisy data using median binning, mean binning and bin boundaries.
This isn't so bad. First off, take a look at the distribution of your data. You can see that the majority of your data has double digits. The outliers are those with single digits, or those that are way larger than double digits. Mind you, this is totally subjective so someone else may tell you that the single digits are part of your data too. Also, the missing data are those numbers that are spaces in between the commas. Let's write some MATLAB code and change these to NaN (or not-a-number), because if you try copying and pasting this code directly into MATLAB, it will give you a syntax error because if you are explicitly defining numbers this way, you have to be sure all of them are there.
To do this, use regexprep so that any parts of this string that have a comma, space, then another comma, put a NaN in between. To do this, we need to put this statement as a string first. We then use eval to convert this string to an actual MATLAB statement:
x = '[ 1 , 48 , 81 , 2 , 10 , 25 , ,14 , 18 , 53 , 41, 56, 89,0, 1000, , 34, 47, 455, 21, , 22, 100 ];'
y = eval(regexprep(x, ', ,', ', NaN, '));
If we display this data, we get:
y =
Columns 1 through 6
1 48 81 2 10 25
Columns 7 through 12
NaN 14 18 53 41 56
Columns 13 through 18
89 0 1000 NaN 34 47
Columns 19 through 23
455 21 NaN 22 100
As such, to answer our first question, any values that are missing are denoted as NaN and those numbers that are bigger than double digits are outliers.
For the next question, we simply extract those values that are not missing, calculate the mean and median of what is not missing, and fill in those NaN values with the mean and median. For the bin boundaries, this is the same thing as using the values to the left (or right... depends on your definition, but let's use left) of the missing value and fill those in. As such:
yMissing = isnan(y); %// Which values are missing?
y_noNaN = y(~yMissing); %// Extract the non-missing values
meanY = mean(y_noNaN); %// Get the mean
medianY = median(y_noNaN); %// Get the median
%// Output - Fill in missing values with median
yMedian = y;
yMedian(yMissing) = medianY;
%// Same for mean
yMean = y;
yMean(yMissing) = meanY;
%// Bin boundaries
yBinBound = y;
yBinBound(yMissing) = y(find(yMissing)-1);
The mean and median for the data of the non-missing values is:
meanY =
105.8500
medianY =
37.5000
The outputs for each of these, in addition to the original data with the missing values looks like:
format bank; %// Do this to show just the first two decimal places for compact output
format compact;
y =
Columns 1 through 5
1 48 81 2 10
Columns 6 through 10
25 NaN 14 18 53
Columns 11 through 15
41 56 89 0 1000
Columns 16 through 20
NaN 34 47 455 21
Columns 21 through 23
NaN 22 100
yMean =
Columns 1 through 5
1.00 48.00 81.00 2.00 10.00
Columns 6 through 10
25.00 105.85 14.00 18.00 53.00
Columns 11 through 15
41.00 56.00 89.00 0 1000.00
Columns 16 through 20
105.85 34.00 47.00 455.00 21.00
Columns 21 through 23
105.85 22.00 100.00
yMedian =
Columns 1 through 5
1.00 48.00 81.00 2.00 10.00
Columns 6 through 10
25.00 37.50 14.00 18.00 53.00
Columns 11 through 15
41.00 56.00 89.00 0 1000.00
Columns 16 through 20
37.50 34.00 47.00 455.00 21.00
Columns 21 through 23
37.50 22.00 100.00
yBinBound =
Columns 1 through 5
1.00 48.00 81.00 2.00 10.00
Columns 6 through 10
25.00 25.00 14.00 18.00 53.00
Columns 11 through 15
41.00 56.00 89.00 0 1000.00
Columns 16 through 20
1000.00 34.00 47.00 455.00 21.00
Columns 21 through 23
21.00 22.00 100.00
If you take a look at each of the output values, this fills in our data with the mean, median and also the bin boundaries as per the question.
all,
I have a large dataset with a lot of continuous NAs, is there any fast way to replace the NAs with the average of previous and next non-missing value by column?
Thanks a lot
Lou
Interesting question... if only you explained clearly what you want. Maybe it's this?
data = [1 3 NaN 7 6 NaN NaN 2].'; %'// example data: column vector
isn = isnan(data); %// determine which values are NaN
inum = find(~isn); %// indices of numbers
inan = find(isn); %// indices of NaNs
comp = bsxfun(#lt,inan.',inum); %'// for each (number,NaN): 1 if NaN precedes num
[~, upper] = max(comp); %// next number to each NaN (max finds *first* maximum)
data(isn) = (data(inum(upper))+data(inum(upper-1)))/2; %// fill with average
In this example: original data:
>> data.'
ans =
1 3 NaN 7 6 NaN NaN 2
Result:
>> data.'
ans =
1 3 5 7 6 4 4 2
If you have a 2D array and want to work by columns, a for loop over columns is probably the best option.
And of course, if there can be NaN's at the beginning or end of a column, the problem is undefined.
Assuming NaNs are not in the first/last row in any column, here is how I would do it:
(If there are multiple consecutive NaNs, it searches for previous ann next non-missing values and averages them).
% Creating A
A=magic(7);
newA=A; %Result will be in newA
A(3,4)=NaN;
A(2,1)=NaN;
A(5,6)=NaN;
A(6,6)=NaN;
A(4,6)=NaN;
% Finding NaN position and calculating positions where we have to average numbers
ind=find(isnan(A));
otherInd=setdiff(1:numel(A(:)),ind);
for i=1:size(ind,1)
temp=otherInd(otherInd<ind(i));
prevInd(i,1)=temp(end);
temp=otherInd(otherInd>ind(i));
nextInd(i,1)=temp(1);
end
% For faster processing purposes
allInd(1:2:2*length(prevInd))=prevInd;
allInd(2:2:2*length(prevInd))=nextInd;
fun=#(block_struct) mean(block_struct.data)
prevNextNums=A(allInd);
A
newA(ind)=blockproc(prevNextNums,[1 2],fun)
%-----------------------Answer--------------------------
A =
30 39 48 1 10 19 28
NaN 47 7 9 18 27 29
46 6 8 NaN 26 35 37
5 14 16 25 34 NaN 45
13 15 24 33 42 NaN 4
21 23 32 41 43 NaN 12
22 31 40 49 2 11 20
newA =
30 39 48 1 10 19 28
38 47 7 9 18 27 29
46 6 8 17 26 35 37
5 14 16 25 34 23 45
13 15 24 33 42 23 4
21 23 32 41 43 23 12
22 31 40 49 2 11 20
I have a double matrix B that has in C1 the year 1998 and in C2 a code (unrepeated). Each C2 is given a value in C3.
% C1 C2 C3
B=[ 1998 22 37
1998 34 7
1998 76 12
1998 98 29
1998 107 14
…]
This is how I got to B:
N1=[N{:,1} N{:,2} N{:,3}];
N1=unique(N1(:,1:3),'rows');
N3= unique(N1(:,1:2),'rows');
for m=1:size(N3,1)
N3(m,3)=sum(N1(:,1) == N3(m,1) & N1(:,2)==N3(m,2));
end
B=N3((N3(:,1) == 1998),:);
I have a cell-array A with the years horizontally disposed in R1, un-repeated values in Y, and corresponding codes in the columns that follow. The codes match the ones in C2 from variable B.
A={Y 1996 1997 1998 1999 %R1
1 107 107 22 22
13 98 98 76 1267
… }
Is there a way I could get a new variable that recognizes the change in the codes in variable A, and presents the corresponding values from C3 in B? For instance:
AB={Y Initial C2 Change C2
1 107 14 22 37
13 98 29 76 12 }
AB = {'Y' 'Initial' 'C2' 'Change' 'C2'}; % initialize result cell array
for i=2:size(A,1) % loop through rows of A
indicesOfChanges = find(diff([A{i,2:end}])); % find changes in row
for j=1:length(indicesOfChanges) % for all changes look up the corresponding values
ind1=find(B(:,2)==A{i,indicesOfChanges(j)+1}); % row index in B of value before change
ind2=find(B(:,2)==A{i,indicesOfChanges(j)+2}); % row index in B of value after change
AB{end+1,1} = A{i,1};
AB{end,2} = B(ind1,2);
AB{end,3} = B(ind1,3);
AB{end,4} = B(ind2,2);
AB{end,5} = B(ind2,3);
end
end
Maybe this could be further improved by using a vectorized approach, but as long as your arrays aren't too big it should work fast enough using loops.
I have an MxN matrix and I want a column vector v, using the vector s that tells me for each row in the matrix what column I will take.
Here's an example:
Matrix =
[ 4 13 93 20 42;
31 18 94 64 02;
7 44 24 91 15;
11 20 43 38 31;
21 42 72 60 99;
13 81 31 87 50;
32 22 83 24 04]
s = [4 4 5 4 4 4 3].'
And the desired output is:
v = [20 64 15 38 60 87 83].'
I thought using the expression
Matrix(:,s)
would've work but it doesn't. Is there a solution without using for loops to access the rows separately?
It's not pretty, and there might be better solutions, but you can use the function sub2ind like this:
M(sub2ind(size(M),1:numel(s),s'))
You can also do it with linear indexing, here is an example:
M=M'; s=s';
M([0:size(M,1):numel(M)-1]+s)
I have a matrix like:
A=
10 31 32 22
32 35 52 77
68 42 84 32
I need a function like mode but with range, for example mymode(A,10) that return 30, find most frequent number in range 0-10, 10-20, 20-30, .... and return most number in range.
You can use histc to bin your data into the ranges of your desire and then find the bin with the most members using max on the output of histc
ranges = 0:10:50; % your desired ranges
[n, bins] = histc(A(:), ranges); % bin the data
[v,i] = max(n); % find the bin with most occurrences
[ranges(i) ranges(i+1)] % edges of the most frequent bin
For your specific example this returns
ans =
30 40
which matches with your required output, as the most values in A lay between 30 and 40.
[M,F] = mode( A((A>=2) & (A<=5)) ) %//only interested in range 2 to 5
...where M will give you the mode and F will give you frequency of occurence
> A = [10 31 32 22; 32 35 52 77; 68 42 84 32]
A =
10 31 32 22
32 35 52 77
68 42 84 32
> min = 10
min = 10
> max = 40
max = 40
> mode(A(A >= min & A <= max))
ans = 32
>
I guess by the number of different answers that we may be missing your goal. Here is my interpretation.
If you want to have many ranges and you want to output most frequent number for every range, create a cell containing all desired ranges (they could overlap) and use cellfun to run mode() for every range. You can also create a cell with desired ranges using arrayfun in a similar manner:
A = [10 31 32 22; 32 35 52 77; 68 42 84 32];
% create ranges
range_step = 10;
range_start=[0:range_step:40];
range=arrayfun(#(r)([r r+range_step]), range_start, 'UniformOutput', false)
% analyze ranges
o = cellfun(#(r)(mode(A(A>=r(1) & A<=r(2)))), range, 'UniformOutput', false)
o =
[10] [10] [22] [32] [42]