How to bins values and plot - matlab

I have a dataset with two columns, the first column is duration (length of time (e.g. 5 min) and the second column is firing rates. Is it possible to plot this in such a way that firing rates are binned according to corresponding duration (e.g. 5, 10, 15 min) and then plot bars with firing rate on the y axis and time on the x?

I'm sure this can be accomplished without the for loop. Solution below uses the discretize function to accomplish the grouping. Other approaches possible.
% MATLAB R2017a
% Sample data
D = 20*rand(25,1);
FR = 550*rand(25,1);
D_bins = (0:5:20)';
ind = discretize(D,D_bins); % groups data
FR_mean = zeros(length(D_bins),1);
for k = 1:length(D_bins)
FR_mean(k) = mean(FR(ind==k));
end
bar(D_bins,FR_mean) % bar plot
% Cosmetics
xlabel('Duration (min)')
ylabel('Mean Firing Rate (unit)')
I'm positive there's a more efficient way to get the means for each group, possibly using arrayfun or some other nifty functions, but will hold off until OP provides more details.

Related

matlab, frequency calculation and code review

i've got a logical/statistical problem:
i have to find out if the fire rate according to four different stimuli given to a neuron are significantly different.
I calculated the frequencies via psth/binning method in matlab and i am not sure if it was the right way. Following i did an anova and a tukey-test via jmp. At first sight it looks good but as mentioned before i don't think the calculation was correct.
Maybe its not the right forum for my problem but perhaps some g can find my mistake or has a better solution. Thanks:D
bins is the number of bins calculated by total duration (800ms) divided by binwidth(10ms).
At the end this function should give me a histogram plotted with freq over time (ms) and the frequencies (here a 1x80 vector with the average freq per bin).
Done for four different stimuli i got 4 vectors, put in jmd and done the tukey.
function [freq] = BinFireRate(data, dur, times_snippet, binwidth)
%function that plots the firing rate of a given dataset via binning method in [hz]
%in: dataset (n x m-matrix), dur as duarion observed from trial
%time_snippet (1,n-vector) for convert data into time values [ms]
%binwidth
%out: histogram of firing rate (freq) over time and frequency [hz]
%[1x80-vector] itself
bins = dur / binwidth;
spiketimes_stim = data .* times_snippet;
spiketimes_stim = spiketimes_stim(spiketimes_stim ~= 0);
[spikes_per_bin, bincenters] = hist(spiketimes_stim, bins);
freq = ((spikes_per_bin / binwidth) / length(data(:, 1))) * 1000;
bar(bincenters, freq);

How to find delay between two sets of data in Matlab?

I have two sets of data taken from experiments, and they look very similar, except there is a horizontal offset between them, which I believe is due to some bugs in the instrument setting. Suppose they have the form y1=f(x1) and y2=f(x2)= f(x1+c), what's the best way to determine c so that I can take into account the offset to superimpose two data sets to become one data set?
Edit:
let's say my data sets (index 1 and 2) have the form:
x1 = 0:0.2:10;
y1 = sin(x1)
x2 = 0:0.3:10;
y2 = sin(x2+0.5)
Of course, the real data will have some noise, but say the best fit functions have the above forms. How do I find the offset c=0.5? I have looked into the cross-correlation, but I'm not sure if they can handle two data sets with different number of data (and different step sizes). Also, what if the offset values actually fall between two data points? Cross-correlation only returns the index of the data in the array, not something in between if I understand correctly.
This Matlab script calculates the random offset from -pi/2 to +pi/2 between two sine waves:
clear;
C= pi*(rand-0.5); % should be between -pi/2 and +pi/2
N=200; % should be large enough for acceptable sampling rate
N1=20; % fraction for Ts1
N2=30; % fraction for Ts2
Ts1=abs(C*N1/N); % fraction of C for accuracy
Ts2=abs(C*N2/N); % fraction of C for accuracy
Ts=min(Ts1,Ts2); % select highest sampling rate (smaller period)
fs=1/Ts;
P=4; % number of periods should be large enough for well defined correlation plot
x1 = 0:Ts:P*2*pi;
y1 = sin(x1);
x2 = 0:Ts:P*2*pi;
y2 = sin(x2+C);
subplot(3,1,1)
plot(x1,y1);
subplot(3,1,2);
plot(x2,y2);
[cor,lag]=xcorr(y1,y2);
subplot(3,1,3);
plot(lag,cor);
[~,I] = max(abs(cor));
lagdiff = lag(I);
timediff=lagdiff/fs;
In the particular case below, C = timediff = 0.5615:
write a function which takes the time shift as an input and calculates rms between overlapping portions of the two data sets. Then find the minimum of this function using optimization (fminbnd)

Remove noise on a wav file

I'm working on a small code to learn signal processing on Matlab. I have got a .wav sound with some noise and I just want to remove the noise. I tried the code under but noise is not removed correctly. My idea is to do a cut band filter to remove the different noise components on the fft. After a lot of researches, I don't understant where is my problem. Here my code :
clear all;
clc;
close all;
% We load and plot the signal
[sig,Fs] = audioread('11.wav');
N = length(sig);
figure,plot(sig); title 'signal'
% FFT of the signal
fft_sig = abs(fft(sig));
% Normalisation of the frequencies for the plot
k = 0 : length(sig) - 1;
f = k*Fs/length(sig);
plot(f,fft_sig);
% Loop with 2 elements because there are 2 amplitudes (not sure about
% this)
for i = 1:2
% I put noise components in an array
[y(i),x(i)] = max(fft_sig);
% Normalisation of the frequencies to eliminate
f(i) = x(i) * (Fs/N);
% Cut band filter with elimination of f from f-20 to f+20 Hz
b = fir1(2 , 2 * [f(i)-20 f(i)+20] / Fs, 'stop')
sig = filter(b,1,sig);
figure,plot(abs(fft(sig)));title 'fft of the signal'
end
Here the images I got, the fft plot is exactly the same before and after applying the filter, there is a modification only on the x axis :
The sampling frequency is Fs = 22050.
Thanks in advance for your help, I hope I'm clear enough in my description
Since you haven't explicitly said so, the code you provided essentially defines your noise as narrowband interference at two frequencies (even though that interference may look less 'noisy').
The first thing to notice is that the value x(i) obtained from max(fft_sig) corresponds to the 1-based index of the located maximum. It won't make a huge difference for large N, but it may have an impact for smaller values especially when trying to design a very narrow notch filter. The corresponding frequency would then be:
freq = (x(i)-1) * (Fs/N);
Also if you are going to iteratively attenuate frequency components you would need to update fft_sig which you use to pick the frequency components to be attenuated (otherwise you will always pick the same component). The simplest would be to recompute fft_sig from the filtered sig:
sig = filter(b,1,sig);
fft_sig = abs(fft(sig));
Finally, since you are trying to attenuate a very narrow frequency range, you may find that you need to increase the FIR filter order by a few orders of magnitudes to get a good attenuation at the desired frequency without attenuating everything else. As was pointed out in comments such a narrow notch filter can often be better implemented using an IIR filter.
The updated code would then look somewhat like:
% Loop with 2 elements because there are 2 amplitudes
for i = 1:2
% I put noise components in an array
% Limit peak search to the lower-half of the spectrum since the
% upper half is symmetric
[y(i),x(i)] = max(fft_sig(1:floor(N/2)+1));
% Normalisation of the frequencies to eliminate
freq = (x(i)-1) * (Fs/N);
% Cut band filter with elimination of f from f-20 to f+20 Hz
b = fir1(2000 , 2 * [freq-20 freq+20] / Fs, 'stop')
sig = filter(b,1,sig);
fft_sig = abs(fft(sig));
%figure;plot(f, abs(fft_sig));title 'fft of the signal'
figure;plot(f, 20*log10(abs(fft_sig)));title 'fft of the signal'
end
As far as the last FFT plot showing a different x-axis, it is simply because you omitted to provide an x-axis variable (in occurrence f used in the previous plot), so the plot shows the array index (instead of frequencies) as the x-axis.

Computing a moving average

I need to compute a moving average over a data series, within a for loop. I have to get the moving average over N=9 days. The array I'm computing in is 4 series of 365 values (M), which itself are mean values of another set of data. I want to plot the mean values of my data with the moving average in one plot.
I googled a bit about moving averages and the "conv" command and found something which i tried implementing in my code.:
hold on
for ii=1:4;
M=mean(C{ii},2)
wts = [1/24;repmat(1/12,11,1);1/24];
Ms=conv(M,wts,'valid')
plot(M)
plot(Ms,'r')
end
hold off
So basically, I compute my mean and plot it with a (wrong) moving average. I picked the "wts" value right off the mathworks site, so that is incorrect. (source: http://www.mathworks.nl/help/econ/moving-average-trend-estimation.html) My problem though, is that I do not understand what this "wts" is. Could anyone explain? If it has something to do with the weights of the values: that is invalid in this case. All values are weighted the same.
And if I am doing this entirely wrong, could I get some help with it?
My sincerest thanks.
There are two more alternatives:
1) filter
From the doc:
You can use filter to find a running average without using a for loop.
This example finds the running average of a 16-element vector, using a
window size of 5.
data = [1:0.2:4]'; %'
windowSize = 5;
filter(ones(1,windowSize)/windowSize,1,data)
2) smooth as part of the Curve Fitting Toolbox (which is available in most cases)
From the doc:
yy = smooth(y) smooths the data in the column vector y using a moving
average filter. Results are returned in the column vector yy. The
default span for the moving average is 5.
%// Create noisy data with outliers:
x = 15*rand(150,1);
y = sin(x) + 0.5*(rand(size(x))-0.5);
y(ceil(length(x)*rand(2,1))) = 3;
%// Smooth the data using the loess and rloess methods with a span of 10%:
yy1 = smooth(x,y,0.1,'loess');
yy2 = smooth(x,y,0.1,'rloess');
In 2016 MATLAB added the movmean function that calculates a moving average:
N = 9;
M_moving_average = movmean(M,N)
Using conv is an excellent way to implement a moving average. In the code you are using, wts is how much you are weighing each value (as you guessed). the sum of that vector should always be equal to one. If you wish to weight each value evenly and do a size N moving filter then you would want to do
N = 7;
wts = ones(N,1)/N;
sum(wts) % result = 1
Using the 'valid' argument in conv will result in having fewer values in Ms than you have in M. Use 'same' if you don't mind the effects of zero padding. If you have the signal processing toolbox you can use cconv if you want to try a circular moving average. Something like
N = 7;
wts = ones(N,1)/N;
cconv(x,wts,N);
should work.
You should read the conv and cconv documentation for more information if you haven't already.
I would use this:
% does moving average on signal x, window size is w
function y = movingAverage(x, w)
k = ones(1, w) / w
y = conv(x, k, 'same');
end
ripped straight from here.
To comment on your current implementation. wts is the weighting vector, which from the Mathworks, is a 13 point average, with special attention on the first and last point of weightings half of the rest.

Moving average for time series with not-equal intervls

I have a dataset for price of the ticker on the stock exchange: time - price. But intervals between data points are not equal - from 1 to 2 minutes.
What is the best practice to calculate moving average for such case?
How to make it in Matlab?
I tend to think, that weights of the points should depend on the time interval that was last since previous point. Does we have function in Matlab to calculate moving average with custom weights of the points?
Here is an example of the "naive" approach I mentioned in the comments above:
% some data (unequally spaced in time, but monotonically non-decreasing)
t = sort(rand(50,1));
x = cumsum(rand(size(t))-0.5);
% linear interpolatation on equally-spaced intervals
tt = linspace(min(t), max(t), numel(t));
xx = interp1(t, x, tt, 'linear');
% plot two data vectors
plot(t, x, 'b.-', tt, xx, 'r.:')
legend({'original', 'equally-spaced'})
My answer is quite similar to lakesh's one. But I will think your problem in terms of interpolation.
First of all, a moving average, or a time average of a function, is the integral of it over a time period, divided by the time length.
In your case, the integral can be seen as a sum, since most generally in each minute the function value is the same. However, your data has unequal time intervals. This can be seen as missing points of the function. Let me explain: for each minute x, you should have a price f(x). But for some times say x=5, f(x) is undefined.
One of the ways you can get rid of discontinuities of a function is interpolation - assign some value to the missing points, according to some rules of calculation. The simpliest algorithm is "keeping the previous value", which is essentially lakesh's idea.
But the benefit of thinking in this aspect lies in the ability to make your data more accurate. It may not apply to a stock market case, but should be true generally, such as a temperature measuring or wind speed, which is guaranteed to smoothly change over the time (rather than keeping constant for 2 minutes and suddenly change in one second). You can use different interpolation techniques to polish the data. "Polishing" in this sense is ok because in any way you have to use the concept of "average". A good interpolation should make the data closer to a model that has been proven to work with the real problem.
CODE - I set the max interval to 5 minutes to show huge difference between the two methods. It depends on your observation and experience to decide which (or any other) method is the best to "predict the past".
% reproduce your scenario
N = 20;
max_interval = 5;
time = randi(max_interval,N,1);
time(1) = 1; % first minute
price = randi(10,N,1);
figure(1)
plot(cumsum(time), price, 'ko-', 'LineWidth', 2);
hold on
% "keeping-previous-value" interpolation
interp1 = zeros(sum(time),1)-1;
interp1(cumsum(time)) = price;
while ismember(-1, interp1)
interp1(interp1==-1) = interp1(find(interp1==-1)-1);
end
plot(interp1, 'bx--')
% "midpoint" interpolation
interp2 = zeros(sum(time),1)-1;
interp2(cumsum(time)) = price;
for ii = 1:length(interp2)
if interp2(ii) == -1
t1 = interp2(ii-1);
t2 = interp2( find(interp2(ii:end)>-1, 1, 'first') +ii-1);
interp2(ii) = (t1+t2)/2;
end
end
plot(interp2, 'rd--')
% "modified-midpoint" interpolation
interp3 = zeros(sum(time),1)-1;
interp3(cumsum(time)) = price;
for ii = 1:length(interp3)
if interp3(ii) == -1
t1 = interp3(ii-1);
t2 = interp3( find(interp3(ii:end)>-1, 1, 'first') +ii-1);
alpha = 1 / find(interp3(ii:end)>-1, 1, 'first');
interp3(ii) = (1-alpha)*t1 + alpha*t2;
end
end
plot(interp3, 'm^--')
hold off
legend('original data', 'interp 1', 'interp 2', 'interp 3')
fprintf(['"keeping-previous-value" (weighted sum) \n', ...
' result: %2.4f \n'], mean(interp1));
fprintf(['"midpoint" (linear interpolation) \n', ...
' result: %2.4f \n'], mean(interp2));
fprintf(['"modified-midpoint" (linear interpolation) \n', ...
' result: %2.4f \n'], mean(interp3));
Note: undefined points should be presented by NaN, but -1 seems easier to play with.
This is my suggestion.
Since you have unequal intervals of data, convert it into equal intervals of data keeping the price constant between unequal intervals.
Then you can use tsmovavg to calculate the moving average of the price series then.
If you are willing to discretize the time value of your data points, the solution should be very straightforward. No matter what kind of window you choose, as long as it's Lipschitz, it can be computed or approximated in amortized O(1) time for each data point or time step using approaches like summed area table.
Else, use a rectangular running window of fixed width that only 'snaps' to data points. Specifically, update the summation of values of all data points within the window only when a data point is joining/leaving the window.
However, if you want to use custom weights for your data points, the method described above no longer works. You can, of course, approximate your spatial kernel with multiple box functions. Otherwise, you might want to look into general bilateral filtering algorithms, as the problem can be formulated as bilateral filtering with a constant range kernel. See the paper Adaptive Manifolds for Real-Time High-Dimensional Filtering for a recently developed algorithm that's relatively easy to implement on this topic. The author's website also provides code in MATLAB.