I need to compute a moving average over a data series, within a for loop. I have to get the moving average over N=9 days. The array I'm computing in is 4 series of 365 values (M), which itself are mean values of another set of data. I want to plot the mean values of my data with the moving average in one plot.
I googled a bit about moving averages and the "conv" command and found something which i tried implementing in my code.:
hold on
for ii=1:4;
M=mean(C{ii},2)
wts = [1/24;repmat(1/12,11,1);1/24];
Ms=conv(M,wts,'valid')
plot(M)
plot(Ms,'r')
end
hold off
So basically, I compute my mean and plot it with a (wrong) moving average. I picked the "wts" value right off the mathworks site, so that is incorrect. (source: http://www.mathworks.nl/help/econ/moving-average-trend-estimation.html) My problem though, is that I do not understand what this "wts" is. Could anyone explain? If it has something to do with the weights of the values: that is invalid in this case. All values are weighted the same.
And if I am doing this entirely wrong, could I get some help with it?
My sincerest thanks.
There are two more alternatives:
1) filter
From the doc:
You can use filter to find a running average without using a for loop.
This example finds the running average of a 16-element vector, using a
window size of 5.
data = [1:0.2:4]'; %'
windowSize = 5;
filter(ones(1,windowSize)/windowSize,1,data)
2) smooth as part of the Curve Fitting Toolbox (which is available in most cases)
From the doc:
yy = smooth(y) smooths the data in the column vector y using a moving
average filter. Results are returned in the column vector yy. The
default span for the moving average is 5.
%// Create noisy data with outliers:
x = 15*rand(150,1);
y = sin(x) + 0.5*(rand(size(x))-0.5);
y(ceil(length(x)*rand(2,1))) = 3;
%// Smooth the data using the loess and rloess methods with a span of 10%:
yy1 = smooth(x,y,0.1,'loess');
yy2 = smooth(x,y,0.1,'rloess');
In 2016 MATLAB added the movmean function that calculates a moving average:
N = 9;
M_moving_average = movmean(M,N)
Using conv is an excellent way to implement a moving average. In the code you are using, wts is how much you are weighing each value (as you guessed). the sum of that vector should always be equal to one. If you wish to weight each value evenly and do a size N moving filter then you would want to do
N = 7;
wts = ones(N,1)/N;
sum(wts) % result = 1
Using the 'valid' argument in conv will result in having fewer values in Ms than you have in M. Use 'same' if you don't mind the effects of zero padding. If you have the signal processing toolbox you can use cconv if you want to try a circular moving average. Something like
N = 7;
wts = ones(N,1)/N;
cconv(x,wts,N);
should work.
You should read the conv and cconv documentation for more information if you haven't already.
I would use this:
% does moving average on signal x, window size is w
function y = movingAverage(x, w)
k = ones(1, w) / w
y = conv(x, k, 'same');
end
ripped straight from here.
To comment on your current implementation. wts is the weighting vector, which from the Mathworks, is a 13 point average, with special attention on the first and last point of weightings half of the rest.
Related
i am a beginner an have an easy question. I have a signal on y-axis and time signal on x-axis. I need to change boundaries of the time signal. It's between 0 and 18 seconds, but i want to change in between 5 and 10. I used already "xlim", it work for plot but actually i want to create a new time signal.
Any idea? Thank you!
Since you didn't post your code I'll need to make some assumptions. I'll assume you have your data stored in row vectors x and y and that x is uniform and monotonically increasing.
1. Construct a truncated signal using logical indexing.
index = x >= 5 & x <= 10;
x_new = x(index);
y_new = y(index);
plot(x_new, y_new);
The above only takes a subset of the data, if x doesn't contain 5 and 10 then the plot will be truncated. If you're dealing with time series data this is probably the most reasonable approach since it doesn't change the sampling rate.
2. Re-sampling the signal between 5 and 10 using interpolation.
num_samples = 100;
x_new = linspace(5, 10, num_samples);
y_new = interp1(x, y, x_new);
plot(x_new, y_new);
This may not exactly match the original plot since the original samples aren't guaranteed to be included. However it will exactly span the desired domain.
3. If you don't care that x is uniform but want to create a plot that exactly matches the original then you can append the bounds of x to the subset from method 1 and use interp1 to sample y.
x_min = 5; x_max = 10;
index = x > x_min & x < x_max;
x_new = [x_min, x(index), x_max];
y_new = interp1(x, y, x_new);
plot(x_new, y_new);
Example
Example demonstrating the differences between the different methods, plotted with additional offset and markings at samples for clarity.
If you want to delete the elements n from the back of a vector y and store the result in y_cut, you should be able to do that with:
y_cut = y(1:end-n);
It would be important to know in which form you stored the time signal.
If you have one value for each second the solution would be:
y_cut = y(5:10);
But I assume you're storing your y-values as samples with a given sample rate fs
One second would then be equal to fs (for example 44100 for a CD audio file, resulting in 44100 samples per second) and the solution would be:
y_cut = y(5*fs:10*fs);
I hope I could help.
Cheers,
Simon
I have two datasets that they are a particular metric from two images (dat1 and dat2). I want both images to have the same response. The 'ideal' image should look like the first dataset (dat1)
but the real image looks like the second dataset(dat2).
I want to try to 'fit' the second dataset to the first dataset. How can i scale dat2 so that it looks like dat1 using Matlab?
I have tried to fit dat1 with different polynomials,exponentials or gaussians and then use the coefficients that i found to fit dat2 but the program fails and it does not fit properly, it gives me a straight zero line. When i try to fit dat2 using the same shape allowing the coefficients to be free then the program does not give me the ideal shape that i want because it follows the trends of dat2.
Is there any way to fit a dataset to a another set of data instead of a function?
Normally, in this situation, a very common approach consists in normalizing all the vectors between 0 and 1 (interval [0,1] with both extremes included). This can be easily achieved as follows:
dat1_norm = rescale(dat1);
dat2_norm = rescale(dat2);
If you have a version of Matlab greater than or equal to 2017b, the rescale function is already included by default. Otherwise, it can be defined as follows:
function x = rescale(x)
x = x - min(x);
x = x ./ max(x);
end
In order to achieve the objective you mention (rescaling dat1 based on minimum and maximum values of dat2), you can proceed as #cemsazara said in his comment:
dat2_scaled = rescale(dat2,min(dat1),max(dat1));
But this is a good solution only as long as you can identify the vector with the larger scale a priori. Otherwise, the risk is to rescale the smaller vector based on the values of the bigger one. That's why the first approach I suggested you may be a more comfortable solution.
In order to adopt this second approach, if your Matlab version is less than 2017b, you must modify the custom rescale function defined above in order to accept two supplementar arguments:
function x = rescale(x,mn,mx)
if (nargin == 1)
mn = min(x);
mx = max(x);
elseif ((nargin == 0) || (nargin == 2))
error('Invalid number of arguments supplied.');
end
x = x - mn;
x = x ./ mx;
end
I am using Matlab for this (preferable idea).
I need to multiply a frequency of a histogram by a scalar value (for each bin).
I have tried this approach in a similar question but it is defined for hist and not histogram function.
This is my original distribution that needs to be multiplied:
This is what I get using the approach given in the similar question:
Additionally, when I finish this part I will have more histograms that I need to sum up into one histogram. So how would I do that? They might have different ranges.
The documentation clearly explains how to replicate the behavior of hist with histogram.
For example:
A = rand(100, 1);
h = histogram(A);
figure
h_new = histogram('BinCounts', h.Values*2, 'BinEdges', h.BinEdges);
Generates the following histograms:
You can modify the Bincounts like this:
X = normrnd(0,1,1000,1); % some data
h = histogram(X,3); % histogram with 3 bins
h.BinCounts = h.Values.*[3 5 1]; % scale each bin by factor 3, 5 and 1 respectively
I have a vector of data, which contains integers in the range -20 20.
Bellow is a plot with the values:
This is a sample of 96 elements from the vector data. The majority of the elements are situated in the interval -2, 2, as can be seen from the above plot.
I want to eliminate the noise from the data. I want to eliminate the low amplitude peaks, and keep the high amplitude peak, namely, peaks like the one at index 74.
Basically, I just want to increase the contrast between the high amplitude peaks and low amplitude peaks, and if it would be possible to eliminate the low amplitude peaks.
Could you please suggest me a way of doing this?
I have tried mapstd function, but the problem is that it also normalizes that high amplitude peak.
I was thinking at using the wavelet transform toolbox, but I don't know exact how to reconstruct the data from the wavelet decomposition coefficients.
Can you recommend me a way of doing this?
One approach to detect outliers is to use the three standard deviation rule. An example:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14;
subplot(211), plot(x)
%# tone down the noisy points
mu = mean(x); sd = std(x); Z = 3;
idx = ( abs(x-mu) > Z*sd ); %# outliers
x(idx) = Z*sd .* sign(x(idx)); %# cap values at 3*STD(X)
subplot(212), plot(x)
EDIT:
It seems I misunderstood the goal here. If you want to do the opposite, maybe something like this instead:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14; x(25) = 20;
subplot(211), plot(x)
%# zero out everything but the high peaks
mu = mean(x); sd = std(x); Z = 3;
x( abs(x-mu) < Z*sd ) = 0;
subplot(212), plot(x)
If it's for demonstrative purposes only, and you're not actually going to be using these scaled values for anything, I sometimes like to increase contrast in the following way:
% your data is in variable 'a'
plot(a.*abs(a)/max(abs(a)))
edit: since we're posting images, here's mine (before/after):
You might try a split window filter. If x is your current sample, the filter would look something like:
k = [L L L L L L 0 0 0 x 0 0 0 R R R R R R]
For each sample x, you average a band of surrounding samples on the left (L) and a band of surrounding samples on the right. If your samples are positive and negative (as yours are) you should take the abs. value first. You then divide the sample x by the average value of these surrounding samples.
y[n] = x[n] / mean(abs(x([L R])))
Each time you do this the peaks are accentuated and the noise is flattened. You can do more than one pass to increase the effect. It is somewhat sensitive to the selection of the widths of these bands, but can work. For example:
Two passes:
What you actually need is some kind of compression to scale your data, that is: values between -2 and 2 are scale by a certain factor and everything else is scaled by another factor. A crude way to accomplish such a thing, is by putting all small values to zero, i.e.
x = randn(1,100)/2; x(50) = 20; x(25) = -15; % just generating some data
threshold = 2;
smallValues = (abs(x) <= threshold);
y = x;
y(smallValues) = 0;
figure;
plot(x,'DisplayName','x'); hold on;
plot(y,'r','DisplayName','y');
legend show;
Please do not that this is a very nonlinear operation (e.g. when you have wanted peaks valued at 2.1 and 1.9, they will produce very different behavior: one will be removed, the other will be kept). So for displaying, this might be all you need, for further processing it might depend on what you are trying to do.
To eliminate the low amplitude peaks, you're going to equate all the low amplitude signal to noise and ignore.
If you have any apriori knowledge, just use it.
if your signal is a, then
a(abs(a)<X) = 0
where X is the max expected size of your noise.
If you want to get fancy, and find this "on the fly" then, use kmeans of 3. It's in the statistics toolbox, here:
http://www.mathworks.com/help/toolbox/stats/kmeans.html
Alternatively, you can use Otsu's method on the absolute values of the data, and use the sign back.
Note, these and every other technique I've seen on this thread is assuming you are doing post processing. If you are doing this processing in real time, things will have to change.
I have a MATLAB function that finds charateristic points in a sample. Unfortunatley it only works about 90% of the time. But when I know at which places in the sample I am supposed to look I can increase this to almost 100%. So I would like to know if there is a function in MATLAB that would allow me to find the range where most of my results are, so I can then recalculate my characteristic points. I have a vector which stores all the results and the right results should lie inside a range of 3% between -24.000 to 24.000. Wheras wrong results are always lower than the correct range. Unfortunatley my background in statistics is very rusty so I am not sure how this would be called.
Can somebody give me a hint what I would be looking for? Is there a function build into MATLAB that would give me the smallest possible range where e.g. 90% of the results lie.
EDIT: I am sorry if I didn't make my question clear. Everything in my vector can only range between -24.000 and 24.000. About 90% of my results will be in a range which spans approximately 1.44 ([24-(-24)]*3% = 1.44). These are very likely to be the correct results. The remaining 10% are outside of that range and always lower (why I am not sure taking then mean value is a good idea). These 10% are false and result from blips in my input data. To find the remaining 10% I want to repeat my calculations, but now I only want to check the small range.
So, my goal is to identify where my correct range lies. Delete the values I have found outside of that range. And then recalculate my values, not on a range between -24.000 and 24.000, but rather on a the small range where I already found 90% of my values.
The relevant points you're looking for are the percentiles:
% generate sample data
data = [randn(900,1) ; randn(50,1)*3 + 5; ; randn(50,1)*3 - 5];
subplot(121), hist(data)
subplot(122), boxplot(data)
% find 5th, 95th percentiles (range that contains 90% of the data)
limits = prctile(data, [5 95])
% find data in that range
reducedData = data(limits(1) < data & data < limits(2));
Other approachs exist to detect outliers, such as the IQR outlier test and the three standard deviation rule, among many others:
%% three standard deviation rule
z = 3;
bounds = z * std(data)
reducedData = data( abs(data-mean(data)) < bounds );
and
%% IQR outlier test
Q = prctile(data, [25 75]);
IQ = Q(2)-Q(1);
%a = 1.5; % mild outlier
a = 3.0; % extreme outlier
bounds = [Q(1)-a*IQ , Q(2)+a*IQ]
reducedData = data(bounds(1) < data & data < bounds(2));
BTW if you want to get the z value (|X|<z) that corresponds to 90% area under the curve, use:
area = 0.9; % two-tailed probability
z = norminv(1-(1-area)/2)
Maybe you should try mean value (in matlab: mean) and standard deviation (in matlab: std)?
What is the statistic distribution of your data?
See also this wiki page, section "Interpretation and application".
In general for almost every distribution, very useful Chebyshev's inequalities take place.
In most of the cases this should work:
meanval = mean(data)
stDev = std(data)
and probably the most (75%) of your values will be placed in range:
<meanVal - 2*stDev, meanVal + 2*stDev>
it seems like maybe you want to find the number x in [-24,24] that maximizes the number of sample points in [x,x+1.44]; probably the fastest way to do this involves a sort of the sample points, which is ultimately nlog(n) time; a cheesy approximation would be as follows:
brkpoints = linspace(-24,24-1.44,n_brkpoints); %choose n_brkpoints big, but < # of sample points?
n_count = histc(data,[brkpoints,inf]); %count # data points between breakpoints;
accbins = 1.44 / (brkpoints(2) - brkpoints(1); %# of bins to accumulate;
cscount = cumsum(n_count); %half of the boxcar sum computation;
boxsum = cscount - [zeros(accbins,1);cscount(1:end-accbins)]; %2nd half;
[dum,maxi] = max(boxsum); %which interval has the maximal # counts?
lorange = brkpoints(maxi); %the lower range;
hirange = lorange + 1.44
this solution does fudge some of the corner case stuff about the bottom and top bin, etc.
note that if you're going to go by the Chebyshev inequality route, Petunin's Inequality is probably applicable, and will give a slight boost.