Audio filtering strange issue with MATLAB - matlab

I'm filternig an audio file with matlab inside a function script which has a function named "filtro_pf" created by me which is a band-pass filter and its outputs are the IIR corresponding coefficients:
[b,a] = filtro_pf(Ap,As,fp1,fs1,fp2,fs2,1); //BAND PASS FILTER
b2 = [0.0039,0,-0.0156,0,0.0234,0,-0.0156,0,0.0039];
a2 =[1.0000,-6.2682,17.2660,-27.3421,27.2582,-17.5284,7.0998,-1.6555,0.1701];
wfpf = filter(b,a,audio_stream);
wavplay(wfpf,fs);
Note that the second and third lines (b2 and a2) are the values which the function 'filtro_pb()' gives me for those inputs. I've run once and then copied them to these variables. Now, after I run this script, if I ask for 'a' and 'a2' in the console I will have:
a =
1.0000 -6.2682 17.2660 -27.3421 27.2582 -17.5284 7.0998 -1.6555 0.1701
a2 =
1.0000 -6.2682 17.2660 -27.3421 27.2582 -17.5284 7.0998 -1.6555 0.1701
They are pretty much the same. But if I use 'a2' instead of 'a' in the filter() function, it does not work. I hear a kind of tick sound and that's it. With 'a' I can hear the sound correctly filtered. This same code is used before and it does work:
%[b,a] = filtro_pa(Ap,As,fp,fs,1); //HIGH PASS FILTER
b = [0.5411 -1.6232 1.6232 -0.5411];
a = [1.0000 -1.8062 1.2298 -0.2925];
wfpa = filter(b2, a2, audio_stream);
wavplay(wfpa,fs);
Again, I used this script previously and saw that these values (from a2 and b2) were the output to these inputs. Now instead of calling the function again (which is commented by the way) I use the 'a' and 'b'vectors directly. It does work for the LowPass and for the HighPass filter.
All of this are for test purposes so I do not expect suggestions like "why use the vector instead of calling the function then?".
I just want to know, how can the function not work with the second variable if they are pretty much the same?

There is more precision present in the variables than is displayed, which means that your a2 and b2 vectors are not the same as a and b. It might appear surprising that errors on that order would make the filter unstable, but it appears that is what is happening. You should be able to explore this by looking at the filter response with freqz, and by plotting up the resulting audio vector rather than just listening to it.
You can use format long to print more precision, but these will still have some rounding error. To avoid, save the vectors to a .mat file and reload that. The .mat file will use binary format and store the full precision of your vectors.
The reason it works for the other filters is probably because those filters are less sensitive to rounding errors in their coefficients: they have fewer coefficients, and those coefficients are less extreme in value.
Here's a sample comparison of frequency response:
[H W] = freqz(b2, a2); % your filter (with error)
a_error = zeros(size(a2));
a_error(9) = a_error(9)+.001; % a little bit of error in a single coefficient
[HE WE] = freqz(b2, a2 + a_error); % frequency response of THAT filter
plot(log10(abs([H HE])))
As you can see, a small change makes a large difference.
An actual analysis of the instability comes from looking at the poles and zeros of the filter:
[z p k] = tf2zp(b2, a2);
abs(p)
If any poles have magnitude greater than 1 (this one does), the filter will be unstable. Try the real values, then your "approximate" values, and see what happens.

Related

Different results using filter object and z,p,k implementation

I'm comparing the output of digital filtering using MATLAB filter object and zero-pole-gain method, and they are not quite the same. I'll appreciate if anyone could point me out in the right direction, and to know what exactly matlab does, because it seems it doesn't use b-a coefficients nor zero-pole-gain method. Here there are my results and the test script (download data.txt):
srate=64;
freqrange=[0.4 3.5];
file='data.txt';
load(file);
file=split(file,'.');
file=file{1};
data=eval(file);
st=srate*60;
ed=srate*60*2;
data=data(st:ed);%1 minute data
m=numel(data);
x=data;
R=0.1;%10% of signal
Nr=50;
NR=min(round(m*R),Nr);%At most 50 points
x1=2*x(1)-flipud(x(2:NR+1));%maintain continuity in level and slope
x2=2*x(end)-flipud(x(end-NR:end-1));
x=[x1;x;x2];
[xx,dbp]=bandpass(x,freqrange,srate);%same result if use filter(dbp,xx)
data_fil=xx(NR+1:end-NR);
[z,p,k]=dbp.zpk;
[sos, g] = zp2sos(z, p, k);
xx=filtfilt(sos, g, x);
data_fil_zpk=xx(NR+1:end-NR);
f=figure('Numbertitle','off','Name','test_filters','units','normalized','outerposition',[0 0
1 1],'menubar','none','Visible','on');
plot([data data_fil data_fil_zpk])
legend('raw','dbp filter','zpk filter')
title('raw vs dbp vs zpk')
error=norm(data_fil-data_fil_zpk)/norm(data_fil)*100
d1=fvtool(dbp);
set(d1,'Name','MR_dbp');
d2=fvtool(sos);
set(d2,'Name','MR_zpk');
and here there is the result I obtain:
as you can see, the filtered signals are almost the same. At this point I'm lost at what exactly matlab does.
Here you can see the magnitude response plots:
They have the same shape, so they are filtering, but in the case of the z-p-k one, max magnitude is around 60 dB while the other is 0db. Why does this happens? Does that means that zpk is amplifying the frequencies?
Finally, you can the the PSD plot for both filtered signals (first dbp, second zpk):
Once again we can see the filter is filtering but you can also see a greater value in the first peak of the second plot (zpk filter)...

why does a*b*a take longer than (a'*(a*b)')' when using gpuArray in Matlab scripts?

The code below performs the operation the same operation on gpuArrays a and b in two different ways. The first part computes (a'*(a*b)')' , while the second part computes a*b*a. The results are then verified to be the same.
%function test
clear
rng('default');rng(1);
a=sprand(3000,3000,0.1);
b=rand(3000,3000);
a=gpuArray(a);
b=gpuArray(b);
tic;
c1=gather(transpose(transpose(a)*transpose(a*b)));
disp(['time for (a''*(a*b)'')'': ' , num2str(toc),'s'])
clearvars -except c1
rng('default');
rng(1)
a=sprand(3000,3000,0.1);
b=rand(3000,3000);
a=gpuArray(a);
b=gpuArray(b);
tic;
c2=gather(a*b*a);
disp(['time for a*b*a: ' , num2str(toc),'s'])
disp(['error = ',num2str(max(max(abs(c1-c2))))])
%end
However, computing (a'*(a*b)')' is roughly 4 times faster than computing a*b*a. Here is the output of the above script in R2018a on an Nvidia K20 (I've tried different versions and different GPUs with the similar behaviour).
>> test
time for (a'*(a*b)')': 0.43234s
time for a*b*a: 1.7175s
error = 2.0009e-11
Even more strangely, if the first and last lines of the above script are uncommented (to turn it into a function), then both take the longer amount of time (~1.7s instead of ~0.4s). Below is the output for this case:
>> test
time for (a'*(a*b)')': 1.717s
time for a*b*a: 1.7153s
error = 1.0914e-11
I'd like to know what is causing this behaviour, and how to perform a*b*a or (a'*(a*b)')' or both in the shorter amount of time (i.e. ~0.4s rather than ~1.7s) inside a matlab function rather than inside a script.
There seem to be an issue with multiplication of two sparse matrices on GPU. time for sparse by full matrix is more than 1000 times faster than sparse by sparse. A simple example:
str={'sparse*sparse','sparse*full'};
for ii=1:2
rng(1);
a=sprand(3000,3000,0.1);
b=sprand(3000,3000,0.1);
if ii==2
b=full(b);
end
a=gpuArray(a);
b=gpuArray(b);
tic
c=a*b;
disp(['time for ',str{ii},': ' , num2str(toc),'s'])
end
In your context, it is the last multiplication which does it. to demonstrate I replace a with a duplicate c, and multiply by it twice, once as sparse and once as full matrix.
str={'a*b*a','a*b*full(a)'};
for ii=1:2
%rng('default');
rng(1)
a=sprand(3000,3000,0.1);
b=rand(3000,3000);
rng(1)
c=sprand(3000,3000,0.1);
if ii==2
c=full(c);
end
a=gpuArray(a);
b=gpuArray(b);
c=gpuArray(c);
tic;
c1{ii}=a*b*c;
disp(['time for ',str{ii},': ' , num2str(toc),'s'])
end
disp(['error = ',num2str(max(max(abs(c1{1}-c1{2}))))])
I may be wrong, but my conclusion is that a * b * a involves multiplication of two sparse matrices (a and a again) and is not treated well, while using transpose() approach divides the process to two stage multiplication, in none of which there are two sparse matrices.
I got in touch with Mathworks tech support and Rylan finally shed some light on this issue. (Thanks Rylan!) His full response is below. The function vs script issue appears to be related to certain optimizations matlab applies automatically to functions (but not scripts) not working as expected.
Rylan's response:
Thank you for your patience on this issue. I have consulted with the MATLAB GPU computing developers to understand this better.
This issue is caused by internal optimizations done by MATLAB when encountering some specific operations like matrix-matrix multiplication and transpose. Some of these optimizations may be enabled specifically when executing a MATLAB function (or anonymous function) rather than a script.
When your initial code was being executed from a script, a particular matrix transpose optimization is not performed, which results in the 'res2' expression being faster than the 'res1' expression:
n = 2000;
a=gpuArray(sprand(n,n,0.01));
b=gpuArray(rand(n));
tic;res1=a*b*a;wait(gpuDevice);toc % Elapsed time is 0.884099 seconds.
tic;res2=transpose(transpose(a)*transpose(a*b));wait(gpuDevice);toc % Elapsed time is 0.068855 seconds.
However when the above code is placed in a MATLAB function file, an additional matrix transpose-times optimization is done which causes the 'res2' expression to go through a different code path (and different CUDA library function call) compared to the same line being called from a script. Therefore this optimization generates slower results for the 'res2' line when called from a function file.
To avoid this issue from occurring in a function file, the transpose and multiply operations would need to be split in a manner that stops MATLAB from applying this optimization. Separating each clause within the 'res2' statement seems to be sufficient for this:
tic;i1=transpose(a);i2=transpose(a*b);res3=transpose(i1*i2);wait(gpuDevice);toc % Elapsed time is 0.066446 seconds.
In the above line, 'res3' is being generated from two intermediate matrices: 'i1' and 'i2'. The performance (on my system) seems to be on par with that of the 'res2' expression when executed from a script; in addition the 'res3' expression also shows similar performance when executed from a MATLAB function file. Note however that additional memory may be used to store the transposed copy of the initial array. Please let me know if you see different performance behavior on your system, and I can investigate this further.
Additionally, the 'res3' operation shows faster performance when measured with the 'gputimeit' function too. Please refer to the attached 'testscript2.m' file for more information on this. I have also attached 'test_v2.m' which is a modification of the 'test.m' function in your Stack Overflow post.
Thank you for reporting this issue to me. I would like to apologize for any inconvenience caused by this issue. I have created an internal bug report to notify the MATLAB developers about this behavior. They may provide a fix for this in a future release of MATLAB.
Since you had an additional question about comparing the performance of GPU code using 'gputimeit' vs. using 'tic' and 'toc', I just wanted to provide one suggestion which the MATLAB GPU computing developers had mentioned earlier. It is generally good to also call 'wait(gpuDevice)' before the 'tic' statements to ensure that GPU operations from the previous lines don't overlap in the measurement for the next line. For example, in the following lines:
b=gpuArray(rand(n));
tic; res1=a*b*a; wait(gpuDevice); toc
if the 'wait(gpuDevice)' is not called before the 'tic', some of the time taken to construct the 'b' array from the previous line may overlap and get counted in the time taken to execute the 'res1' expression. This would be preferred instead:
b=gpuArray(rand(n));
wait(gpuDevice); tic; res1=a*b*a; wait(gpuDevice); toc
Apart from this, I am not seeing any specific issues in the way that you are using the 'tic' and 'toc' functions. However note that using 'gputimeit' is generally recommended over using 'tic' and 'toc' directly for GPU-related profiling.
I will go ahead and close this case for now, but please let me know if you have any further questions about this.
%testscript2.m
n = 2000;
a = gpuArray(sprand(n, n, 0.01));
b = gpuArray(rand(n));
gputimeit(#()transpose_mult_fun(a, b))
gputimeit(#()transpose_mult_fun_2(a, b))
function out = transpose_mult_fun(in1, in2)
i1 = transpose(in1);
i2 = transpose(in1*in2);
out = transpose(i1*i2);
end
function out = transpose_mult_fun_2(in1, in2)
out = transpose(transpose(in1)*transpose(in1*in2));
end
.
function test_v2
clear
%% transposed expression
n = 2000;
rng('default');rng(1);
a = sprand(n, n, 0.1);
b = rand(n, n);
a = gpuArray(a);
b = gpuArray(b);
tic;
c1 = gather(transpose( transpose(a) * transpose(a * b) ));
disp(['time for (a''*(a*b)'')'': ' , num2str(toc),'s'])
clearvars -except c1
%% non-transposed expression
rng('default');
rng(1)
n = 2000;
a = sprand(n, n, 0.1);
b = rand(n, n);
a = gpuArray(a);
b = gpuArray(b);
tic;
c2 = gather(a * b * a);
disp(['time for a*b*a: ' , num2str(toc),'s'])
disp(['error = ',num2str(max(max(abs(c1-c2))))])
%% sliced equivalent
rng('default');
rng(1)
n = 2000;
a = sprand(n, n, 0.1);
b = rand(n, n);
a = gpuArray(a);
b = gpuArray(b);
tic;
intermediate1 = transpose(a);
intermediate2 = transpose(a * b);
c3 = gather(transpose( intermediate1 * intermediate2 ));
disp(['time for split equivalent: ' , num2str(toc),'s'])
disp(['error = ',num2str(max(max(abs(c1-c3))))])
end
EDIT 2 I might have been right, see this other answer
EDIT: They use MAGMA, which is column major. My answer does not hold, however I will leave it here for a while in case it can help crack this strange behavior.
The below answer is wrong
This is my guess, I can not 100% tell you without knowing the code under MATLAB's hood.
Hypothesis: MATLABs parallel computing code uses CUDA libraries, not their own.
Important information
MATLAB is column major and CUDA is row major.
There is no such things as 2D matrices, only 1D matrices with 2 indices
Why does this matter? Well because CUDA is highly optimized code that uses memory structure to maximize cache hits per kernel (the slowest operation on GPUs is reading memory). This means a standard CUDA matrix multiplication code will exploit the order of memory reads to make sure they are adjacent. However, what is adjacent memory in row-major is not in column-major.
So, there are 2 solutions to this as someone writing software
Write your own column-major algebra libraries in CUDA
Take every input/output from MATLAB and transpose it (i.e. convert from column-major to row major)
They have done point 2, and assuming that there is a smart JIT compiler for MATLAB parallel processing toolbox (reasonable assumption), for the second case, it takes a and b, transposes them, does the maths, and transposes the output when you gather.
In the first case however, you already do not need to transpose the output, as it is internally already transposed and the JIT catches this, so instead of calling gather(transpose( XX )) it just skips the output transposition is side. The same with transpose(a*b). Note that transpose(a*b)=transpose(b)*transpose(a), so suddenly no transposes are needed (they are all internally skipped). A transposition is a costly operation.
Indeed there is a weird thing here: making the code a function suddenly makes it slow. My best guess is that because the JIT behaves differently in different situations, it doesn't catch all this transpose stuff inside and just does all the operations anyway, losing the speed up.
Interesting observation: It takes the same time in CPU than GPU to do a*b*a in my PC.

No output while using "gmres" in Matlab

I want to solve a system of linear equalities of the type Ax = b+u, where A and b are known. I used a function in MATLAB like this:
x = #(u) gmres(A,b+u);
Then I used fmincon, where a value for u is given to this expression and x is computed. For example
J = #(u) (x(u)' * x(u) - x^*)^2
and
[J^*,u] = fmincon(J,...);
withe the dots as matrices and vectors for the equalities and inequalities.
My problem is, that MATLAB delivers always an output with information about the command gmres. But I have no idea, how I can stop this (it makes the Program much slower).
I hope you know an answer.
Patsch
It's a little hidden in the documentation, but it does say
No messages are displayed if the flag output is specified.
So you need to call gmres with at least two outputs. You can do this by making a wrapper function
function x = gmresnomsg(varargin)
[x,~] = gmres(varargin{:});
end
and use that for your handle creation
x = #(u) gmresnomsg(A,b+u);

How can I use of norm(a,b) in matlab if a, b are double type?

I must to use angle = atan2(norm(cross(a,b)),dot(a,b)), for calculating the angle between two vectors a,b and these are double type and norm is undefined for this type. How do I resolve this problem? I need to calculate the angle between two vectors this way.
In your comments, you have shown us how you are actually writing out the angle calculation and it is not the same as how you have put it in your post.
atan2(norm(cross(I(i,j,:),I_avg)),dot(I(i,j,:),I_avg));
I is an image you are loading in. I'm assuming it's colour because of the way you are subsetting I. Because I is a 3D matrix, doing I(i,j,:) will give you a 1 x 1 x 3 vector when in fact this has to be a 1D vector. norm does not recognize this structure which is why you're getting this error. Therefore, you need to use squeeze to remove the singleton dimensions so that this will become a 3 x 1 vector, rather than a 1 x 1 x 3 vector. As such, you need to rewrite your code so that you're doing this instead. Bear in mind that in your comments, angle is always overwritten inside the for loop, so you probably want to save the results of each pixel. With this, you probably want to create a 2D array of angles that will store these results. In other words:
I=imread('thesis.jpg');
I = double(I);
angles = zeros(m,n);
I_avg = squeeze(I_avg); %// Just in case
for i=1:m
for j=1:n
pixels = squeeze(I(i,j,:)); %// Add this statement and squeeze
angles(i,j) = atan2(norm(pixels,I_avg)),dot(pixels,I_avg)); %// Change
end
end
Minor note
MATLAB has a built-in function called angle that determines the angle from the origin to a complex number in the complex plane. It is not recommended you call your variable angle as this will unintentionally shadow over the angle function, and any other code that you create from this point onwards may rely on that actual angle function, and you will get unintended results.
Another minor note
Using i and j as loop variables is not recommended. These letters are reserved for the complex number, and this can produce unintentional results. Take a look at this question and post by Shai here - Using i and j as variables in Matlab. As such, it is suggested you use other variable names instead.
As #rayryeng has successfully answered this question, I would like to turn my post into a more general one by sharing my experience in debugging in Matlab. I hope anyone who somehow managed to find this post get more or less thinking about the habits a good programmer should have.
The question goes like: "How would I do if I get errors?"
Here's an excellent article by Eric in which he lists the rule-of-thumbs when you encounter a bug and wish to get rid of it. It's originally been cited by Stackoverflow, and that's the reason I read it.
If you still get no clue / idea how you can play with your code, see how this person does:
Pin-point the buggy line
(The number should start with 0) Make sure before running a script, you clear out any previously stored variables, including the notorious i and j's (you should never see them in any workspace). If any one is needed for the buggy code to run, save('buggy.mat','importantvar') before clear and load('buggy.mat') after clear.
By doing so, you can isolate your buggy code from anything else, which could have bad influences. For example, in a previously called script, there is a line
double = [2,4,6]; % you should never name a variable `double`
and in the next script, you have
>> e = str2num('uint8(200)')
e =
200
>> double(e)
Index exceeds matrix dimensions.
>>
>> f = single(2.36)
f =
2.3600
>> double(f)
Subscript indices must either be real positive integers or
logicals.
>>
The reason is double is no longer an inbuild function, but a user-defined variable. Too bad to pick up a name carelessly!
....anyway, let's clear the workspace and get rid of double.
>> clear
Read the error message, thoroughly.
Now let's begin with OP's problem. The original code (trimmed) goes like this -
img = imread('peppers.png');
a = img(300,200,:);
b = img(200,300,:);
d = norm(cross(a,b));
.... hence the error
Undefined function 'norm' for input arguments of type 'uint8'.
Error in untitled (line 6)
d = norm(cross(a,b));
Most beginners are only interested in the first line of the error message, which by it alone usually doesn't provide any useful help, or only in the red color, which leads to the famous question "my code does not work!"
But think twice. You still have another 2 lines unread! Error in untitled (line 6) says I'm running a script named untitled and the (first) error lies in line 6, and the code in that line is d = norm(cross(a,b));.
Now, at least you know a little more about your code - "My code d = norm(cross(a,b)); doesn't work!"
Although most likely we may also vote this kind of question to get closed, it's still much much better than a simply "It does not work!".
Now we can pin-point the buggy line
try
% this line will raise an error
d = norm(cross(a,b));
catch err
disp(err.message)
end
Look into the functions
First, make sure the inner function cross works as expected -
>> cross(a,b)
ans(:,:,1) =
0
ans(:,:,2) =
255
ans(:,:,3) =
0
>>
Good. So now we can even narrow down the error to the outer norm.
One more thing to mention. You can always find Mathworks' documentation for any in-build function, by typing "matlab function", such as "matlab norm" in Google (or any other search engine) and clicking on the first result. If you prefer, you can also type in Matlab command window doc _function_ such as doc norm and read the doc in Matlab. It's of course a pleasure of us on Stackoverflow to give you the reference by doing the same thing, but it takes a longer time because a human is, in this aspect, always slower than a search engine.
The error reads Undefined function 'norm' for input arguments of type 'uint8'.. So the input for norm should not be uint8, unsigned 8-bit integer. But what should it be?
% why `norm` "does not work"?
% this line runs perfectly well
norm(cross([1,2,3], [4,5,6]))
% so what is working?
class([1,2,3]) % so `norm` works for `double`
One thing we can do now is convert a and b to double precision. Let's try it now.
% try fixing 'uint8' error
a2 = double(a);
b2 = double(b);
whos a b % now they are double, which `norm` should work for
try
% this line will raise an error
d = norm(cross(a2,b2));
catch err
disp(err.message)
end
Now the error becomes Input must be 2-D.. What's wrong with the input?
% what is "must be 2-D" error?
size(a2) % a2 is 3-D
disp(b2) % b2 is also 3-D
This gives output in command window
ans =
1 1 3
(:,:,1) =
255
(:,:,2) =
150
(:,:,3) =
0
In OP's problem, he/she is trying to calculate something about color difference (to the best of my knowledge) which involves the angle between two color vectors in RGB space. So the vectors are needed. With imread, each pixel of the image is stored as 3 elements in the matrix, first 2 dimension being its physical position, the 3 dimension being RGB channel components. Hence pixel(200,300) with color rgb[255,150,0] is stored by us in variable b wihch is a 3-D vector.
By understanding what we need and what Matlab can do, we can combine these two points into one. We need the norm of the cross product of a and b, while the useful information (the 3 component values) is stored in the 3rd dimension. Matlab can calculate the norm of the cross product of a vector with all its information in the 1st dimension. (Here, "dimension" refers to that of the Matlab variable; a vector with 3 elements in its 1st dimension is physically a 3-D vector).
After thinking twice, we are now able to debug our code - just put all 3 elements into the 1st dimension.
% so we want the 3 elements in the 3rd dimension become in the 1st dim
a3 = squeeze(a2);
b3 = reshape(b2,numel(b2),[]);
try
d = norm(cross(a3,b3));
catch err
disp(err.message)
end
d
Bonus: If by default Matlab treats a 3-D vector as a "1-D array", then most probably the cross function has not been working correctly. Let's make a check -
>> clear
>> a = [1,2,3]
a =
1 2 3
>> b=[4,5,6]
b =
4 5 6
>> cross(a,b)
ans =
-3 6 -3
>>
The result should be the same as the one we can get by calculating by hand.
Now if we put the components into the 3rd dimension of the variable -
>> clear
>> a(1,1,:)=[1,2,3]
a(:,:,1) =
1
a(:,:,2) =
2
a(:,:,3) =
3
>> b(1,1,:)=[4,5,6]
b(:,:,1) =
4
b(:,:,2) =
5
b(:,:,3) =
6
>> cross(a,b)
ans(:,:,1) =
-3
ans(:,:,2) =
6
ans(:,:,3) =
-3
>>
.... seems OK. cross also puts the result in the 3rd dimension. In fact, Mathworks' documentation says
If A and B are vectors, then they must have a length of 3.
If A and B are matrices or multidimensional arrays, then they must
have the same size. In this case, the cross function treats A and B as
collections of three-element vectors. The function calculates the
cross product of corresponding vectors along the first array dimension
whose size equals 3.
At last, one thing is always correct to anyone who wants to do something with programming - be cautious and prudent when writing your code.

MATLAB: Using interpolation to replace missing values (NaN)

I have cell array each containing a sequence of values as a row vector. The sequences contain some missing values represented by NaN.
I would like to replace all NaNs using some sort of interpolation method, how can I can do this in MATLAB? I am also open to other suggestions on how to deal with these missing values.
Consider this sample data to illustrate the problem:
seq = {randn(1,10); randn(1,7); randn(1,8)};
for i=1:numel(seq)
%# simulate some missing values
ind = rand( size(seq{i}) ) < 0.2;
seq{i}(ind) = nan;
end
The resulting sequences:
seq{1}
ans =
-0.50782 -0.32058 NaN -3.0292 -0.45701 1.2424 NaN 0.93373 NaN -0.029006
seq{2}
ans =
0.18245 -1.5651 -0.084539 1.6039 0.098348 0.041374 -0.73417
seq{3}
ans =
NaN NaN 0.42639 -0.37281 -0.23645 2.0237 -2.2584 2.2294
Edit:
Based on the responses, I think there's been a confusion: obviously I'm not working with random data, the code shown above is simply an example of how the data is structured.
The actual data is some form of processed signals. The problem is that during the analysis, my solution would fail if the sequences contain missing values, hence the need for filtering/interpolation (I already considered using the mean of each sequence to fill the blanks, but I am hoping for something more powerful)
Well, if you're working with time-series data then you can use Matlab's built in interpolation function.
Something like this should work for your situation, but you'll need to tailor it a little ... ie. if you don't have equal spaced sampling you'll need to modify the times line.
nseq = cell(size(seq))
for i = 1:numel(seq)
times = 1:length(seq{i});
mask = ~isnan(seq{i});
nseq{i} = seq{i};
nseq{i}(~mask) = interp1(times(mask), seq{i}(mask), times(~mask));
end
You'll need to play around with the options of interp1 to figure out which ones work best for your situation.
I would use inpaint_nans, a tool designed to replace nan elements in 1-d or 2-d matrices by interpolation.
seq{1} = [-0.50782 -0.32058 NaN -3.0292 -0.45701 1.2424 NaN 0.93373 NaN -0.029006];
seq{2} = [0.18245 -1.5651 -0.084539 1.6039 0.098348 0.041374 -0.73417];
seq{3} = [NaN NaN 0.42639 -0.37281 -0.23645 2.0237];
for i = 1:3
seq{i} = inpaint_nans(seq{i});
end
seq{:}
ans =
-0.50782 -0.32058 -2.0724 -3.0292 -0.45701 1.2424 1.4528 0.93373 0.44482 -0.029006
ans =
0.18245 -1.5651 -0.084539 1.6039 0.098348 0.041374 -0.73417
ans =
2.0248 1.2256 0.42639 -0.37281 -0.23645 2.0237
If you have access to the System Identification Toolbox, you can use the MISDATA function to estimate missing values. According to the documentation:
This command linearly interpolates
missing values to estimate the first
model. Then, it uses this model to
estimate the missing data as
parameters by minimizing the output
prediction errors obtained from the
reconstructed data.
Basically the algorithm alternates between estimating missing data and estimating models, in a way similar to the Expectation Maximization (EM) algorithm.
The model estimated can be any of the linear models idmodel (AR/ARX/..), or if non given, uses a default-order state-space model.
Here's how to apply it to your data:
for i=1:numel(seq)
dat = misdata( iddata(seq{i}(:)) );
seq{i} = dat.OutputData;
end
Use griddedInterpolant
There also some other functions like interp1. For curved plots spline is the the best method to find missing data.
As JudoWill says, you need to assume some sort of relationship between your data.
One trivial option would be to compute the mean of your total series, and use those for missing data. Another trivial option would be to take the mean of the n previous and n next values.
But be very careful with this: if you're missing data, you're generally better to deal with those missing data, than to make up some fake data that could screw up your analysis.
Consider the following example
X=some Nx1 array
Y=F(X) with some NaNs in it
then use
X1=X(find(~isnan(Y)));
Y1=Y(find(~isnan(Y)));
Now interpolate over X1 and Y1 to compute all values at all X.