random process modeling in signal processing - matlab

i would like to understand one thing and please help me to clarify it,sometimes it is necessary to represent given data by sum of complex exponentials with additive white noise,let us consider following model using sinusoidal model
clear all;
A1=24;
A2=23;
A3=23;
A4=23;
A5=10;
f1=11.01;
f2=11.005;
f3= 10;
f4=10.9
phi=2*pi*(rand(1,4)-0.5);
t=0:0.01:2.93;
k=1:1:294;
x=rand([1,length(t)]);
y(k)=A1.*sin(2*pi*f1*t+phi(1))+A2.*cos(2*pi*f2*t+phi(2))+A3.*sin(2*pi*f3*t+phi(3))+A4.*cos(2*pi*f4*t+phi(4))+A5.*x;
[pxx,f]=periodogram(y,[],[],100);
plot(f,pxx)
there phases are distributed uniformly in range of [-pi pi],but my main question is related following fact.to represent data as linear combination of complex exponentials with phases uniformly distributed in [-pi pi] interval, should we generate these phases outside of sampling or at each sampling process we should generate new list of phases?please help me to clarify this things

As I stated in the my comment, I don't really understand what you're asking. But, I will answer this as if you had asked it on codereview.
The following is not good practice in MATLAB:
A1=24;
A2=23;
A3=23;
A4=23;
A5=10;
There are very few cases (if any), where you actually need such variable names. Instead, the following would be much better:
A = [24 23 23 23 10];
Now, if you want to use A1, you do A(1) instead.
These two lines:
t=0:0.01:2.93;
k=1:1:294;
They are of course the same size (1x294), but when you do it that way, it's easy to get it wrong. You will of course get errors later on if they're not the same size, so it's nice to make sure that you have it correct on the first try, thus using linspace might be a good idea. The following line will give you the same t as the line above. This way it's easier to be sure you have exactly 294 elements, not 293, 295 or 2940 (it is sometimes easy to miss).
t = linspace(0,2.93,294);
Not really important, but k = 1:1:294 can be simplified to k = 1:294, as the default step size is 1.
The syntax .*, is used for element-wise operations. That is, if you want to multiply each element of a vector (or matrix) with the corresponding element in another one. Using it when multiplying vectors with scalars is therefore unnecessary, * is enough.
Again, not an important point, but x=rand([1,length(t)]); is simpler written x=rand(1, length(t)); (without brackets).
You don't need the index k in y(k) = ..., as k is continuous, starting at 1, with increments of 1. This is the default behavior in MATLAB, thus y = ... is enough. If, however, you only wanted to fill in every other number between 1 and 100, you could do y(1:2:100).
This is far from perfect, but in my opinion big step in the right direction:
A = [24 23 23 23 10];
f = [11.01 11.005 10 10.9]; % You might want to use , as a separator here
phi = 2*pi*(rand(1,4)-0.5);
t = linspace(0,2.93,294);
x = rand(1, length(t));
w = 2*pi*f; % For simplicity
y = A(1)*sin(w(1)*t+phi(1)) + A(2)*cos(w(2)*t+phi(2)) + ...
A(3)*sin(w(3)*t+phi(3)) + A(4)*cos(w(4)*t+phi(4))+A(5)*x;
Another option would be:
z = [sin(w(1)*t+phi(1)); cos(w(2)*t+phi(2)); sin(w(3)*t+phi(3)); ...
cos(w(4)*t+phi(4)); x];
y = A.*z;
This will give you the same y as the first one. Having the same w, t and phi as above, the following will also give you the same results:
c = bsxfun(#times,w,t') + kron(phi,ones(294,1));
y = sum(bsxfun(#times,A,[sin(c(:,1)), cos(c(:,2)), sin(c(:,3)), cos(c(:,4)), x']),2)';
I hope something in here might help you some in your further work. And maybe I actually answered your question. =)

Related

How to use for loop with series in MATLAB

I have here an equation that is in the image
I'm trying to make the plot where y-axis is this equation, and x-axis is time - just a vector.
All the initials I have:
%Initials
Beta=[24 123 117 262 108 45]*10^-5; %pcm
Lambda=[0.0127 0.0317 0.1160 0.3106 1.4006 3.8760]; %1/s
LAMBDA=10^-4 ; %s
W=[ 0.376 -0.0133 -0.0426 -0.153 -0.972 -3.38 -29.5]
Rho=400*10^-5
t=linspace(1,30,7)
This is the code I'm using:
for n=1:7
for j=1:6
S1=Rho*sum(exp(W(n)'.*t)/(W(n)'.*(LAMBDA+(sum(Beta(j).*Lambda(j)./(W(n)+Lambda(j))^2)))))
end
end
semilogy(t,S1,'b','linewidth',2);
And S1 returns too much answers, and as I undestand it should give only 7...
And I am new to matlab and coding in general, so if the answer is obviuos I still don't know how to make it work :D
Let's clarify a few things first.
In order to do a 2D plot (of any kind), you need two vectors in Matlab. They should be of equal length. One for all the x-coordinates. Another for all the y-coordinates.
You get the x-coordinates in t=linspace(1,30,7). However, you do not have the corresponding y-coordinates.
In your case, it is best to phrase your formula as a function of t. And let's break down the sums for clarity. For example,
function num = oscillation_modes_of_sort(t)
outer_sum = 0;
for j=1:numel(W)
inner_sum = 0;
for i=1:numel(Beta)
inner_sum = inner_sum + Beta(i)*Lambda(i)/(W(j)+Lambda(i))^2;
end
outer_sum = out_sum + exp(W(j)*t)/(W(j)*(LAMBDA+inner_sum));
end
num = Rho * outer_sum;
end
Now, your y-coordinates will be oscillation_modes_of_sort(t).
There are ways to either make the code more brief or make it more friendly if W and Beta are much much longer. But let's do those at a future time.

Small bug in MATLAB R2017B LogLikelihood after fitnlm?

Background: I am working on a problem similar to the nonlinear logistic regression described in the link [1] (my problem is more complicated, but link [1] is enough for the next sections of this post). Comparing my results with those obtained in parallel with a R package, I got similar results for the coefficients, but (very approximately) an opposite logLikelihood.
Hypothesis: The logLikelihood given by fitnlm in Matlab is in fact the negative LogLikelihood. (Note that this impairs consequently the BIC and AIC computation by Matlab)
Reasonning: in [1], the same problem is solved through two different approaches. ML-approach/ By defining the negative LogLikelihood and making an optimization with fminsearch. GLS-approach/ By using fitnlm.
The negative LogLikelihood after the ML-approach is:380
The negative LogLikelihood after the GLS-approach is:-406
I imagine the second one should be at least multiplied by (-1)?
Questions: Did I miss something? Is the (-1) coefficient enough, or would this simple correction not be enough?
Self-contained code:
%copy-pasting code from [1]
myf = #(beta,x) beta(1)*x./(beta(2) + x);
mymodelfun = #(beta,x) 1./(1 + exp(-myf(beta,x)));
rng(300,'twister');
x = linspace(-1,1,200)';
beta = [10;2];
beta0=[3;3];
mu = mymodelfun(beta,x);
n = 50;
z = binornd(n,mu);
y = z./n;
%ML Approach
mynegloglik = #(beta) -sum(log(binopdf(z,n,mymodelfun(beta,x))));
opts = optimset('fminsearch');
opts.MaxFunEvals = Inf;
opts.MaxIter = 10000;
betaHatML = fminsearch(mynegloglik,beta0,opts)
neglogLH_MLApproach = mynegloglik(betaHatML);
%GLS Approach
wfun = #(xx) n./(xx.*(1-xx));
nlm = fitnlm(x,y,mymodelfun,beta0,'Weights',wfun)
neglogLH_GLSApproach = - nlm.LogLikelihood;
Source:
[1] https://uk.mathworks.com/help/stats/examples/nonlinear-logistic-regression.html
This answer (now) only details which code is used. Please see Tom Lane's answer below for a substantive answer.
Basically, fitnlm.m is a call to NonLinearModel.fit.
When opening NonLinearModel.m, one gets in line 1209:
model.LogLikelihood = getlogLikelihood(model);
getlogLikelihood is itself described between lines 1234-1251.
For instance:
function L = getlogLikelihood(model)
(...)
L = -(model.DFE + model.NumObservations*log(2*pi) + (...) )/2;
(...)
Please also not that this notably impacts ModelCriterion.AIC and ModelCriterion.BIC, as they are computed using model.LogLikelihood ("thinking" it is the logLikelihood).
To get the corresponding formula for BIC/AIC/..., type:
edit classreg.regr.modelutils.modelcriterion
this is Tom from MathWorks. Take another look at the formula quoted:
L = -(model.DFE + model.NumObservations*log(2*pi) + (...) )/2;
Remember the normal distribution has a factor (1/sqrt(2*pi)), so taking logs of that gives us -log(2*pi)/2. So the minus sign comes from that and it is part of the log likelihood. The property value is not the negative log likelihood.
One reason for the difference in the two log likelihood values is that the "ML approach" value is computing something based on the discrete probabilities from the binomial distribution. Those are all between 0 and 1, and they add up to 1. The "GLS approach" is computing something based on the probability density of the continuous normal distribution. In this example, the standard deviation of the residuals is about 0.0462. That leads to density values that are much higher than 1 at the peak. So the two things are not really comparable. You would need to convert the normal values to probabilities on the same discrete intervals that correspond to individual outcomes from the binomial distribution.

Sampling and DTFT in Matlab

I need to produce a signal x=-2*cos(100*pi*n)+2*cos(140*pi*n)+cos(200*pi*n)
So I put it like this :
N=1024;
for n=1:N
x=-2*cos(100*pi*n)+2*cos(140*pi*n)+cos(200*pi*n);
end
But What I get is that the result keeps giving out 1
I tried to test each values according to each n, and I get the same results for any n
For example -2*cos(100*pi*n) with n=1 has to be -1.393310473. Instead of that, Matlab gave the result -2 for it and it always gave -2 for any n
I don't know how to fix it, so I hope someone could help me out! Thank you!
Not sure where you get the idea that -2*cos(100*pi) should be anything other than -2. Maybe you are not aware that Matlab works in radians?
Look at your expression. Each term can be factored to contain 2*pi*(an integer). And you should know that cos(2*pi*(an integer)) = 1.
So the results are exactly as expected.
What you are seeing is basically what happens when you under-sample a waveform. You may know that the Nyquist criterion says that you need to have a sampling rate that is at least two times greater than the highest frequency component present; but in your case, you are sampling one point every 50, 70, 100 complete cycles. So you are "far beyond Nyquist". And that can only be solved by sampling more closely.
For example, you could do:
t = linspace(0, 1, 1024); % sample the waveform 1024 times between 0 and 1
f1 = 50;
f2 = 70;
f3 = 100;
signal = -2*cos(2*pi*f1*t) + 2*cos(2*pi*f2*t) + cos(2*pi*f3*t);
figure; plot(t, signal)
I think you are using degrees when you are doing your calculations, so do this:
n = 1:1024
x=-2*cosd(100*pi*n)+2*cosd(140*pi*n)+cosd(200*pi*n);
cosd uses degrees instead of radians. Radians is the default for cos so matlab has a separate function when degree input is used. For me this gave:
-2*cosd(100*pi*1) = -1.3933
The first term that I got using:
x=-2*cosd(100*pi*1)+2*cosd(140*pi*1)+cosd(200*pi*1)
x = -1.0693
Also notice that I defined n as n = 1:1024; this will give all integers from 1,2,...,1024,
there is no need to use a for loop since many of Matlab's built in functions are vectorized. Meaning you can just input a vector and it will calculate the function for every element in the vector.

A moving average with different functions and varying time-frames

I have a matrix time-series data for 8 variables with about 2500 points (~10 years of mon-fri) and would like to calculate the mean, variance, skewness and kurtosis on a 'moving average' basis.
Lets say frames = [100 252 504 756] - I would like calculate the four functions above on over each of the (time-)frames, on a daily basis - so the return for day 300 in the case with 100 day-frame, would be [mean variance skewness kurtosis] from the period day201-day300 (100 days in total)... and so on.
I know this means I would get an array output, and the the first frame number of days would be NaNs, but I can't figure out the required indexing to get this done...
This is an interesting question because I think the optimal solution is different for the mean than it is for the other sample statistics.
I've provided a simulation example below that you can work through.
First, choose some arbitrary parameters and simulate some data:
%#Set some arbitrary parameters
T = 100; N = 5;
WindowLength = 10;
%#Simulate some data
X = randn(T, N);
For the mean, use filter to obtain a moving average:
MeanMA = filter(ones(1, WindowLength) / WindowLength, 1, X);
MeanMA(1:WindowLength-1, :) = nan;
I had originally thought to solve this problem using conv as follows:
MeanMA = nan(T, N);
for n = 1:N
MeanMA(WindowLength:T, n) = conv(X(:, n), ones(WindowLength, 1), 'valid');
end
MeanMA = (1/WindowLength) * MeanMA;
But as #PhilGoddard pointed out in the comments, the filter approach avoids the need for the loop.
Also note that I've chosen to make the dates in the output matrix correspond to the dates in X so in later work you can use the same subscripts for both. Thus, the first WindowLength-1 observations in MeanMA will be nan.
For the variance, I can't see how to use either filter or conv or even a running sum to make things more efficient, so instead I perform the calculation manually at each iteration:
VarianceMA = nan(T, N);
for t = WindowLength:T
VarianceMA(t, :) = var(X(t-WindowLength+1:t, :));
end
We could speed things up slightly by exploiting the fact that we have already calculated the mean moving average. Simply replace the within loop line in the above with:
VarianceMA(t, :) = (1/(WindowLength-1)) * sum((bsxfun(#minus, X(t-WindowLength+1:t, :), MeanMA(t, :))).^2);
However, I doubt this will make much difference.
If anyone else can see a clever way to use filter or conv to get the moving window variance I'd be very interested to see it.
I leave the case of skewness and kurtosis to the OP, since they are essentially just the same as the variance example, but with the appropriate function.
A final point: if you were converting the above into a general function, you could pass in an anonymous function as one of the arguments, then you would have a moving average routine that works for arbitrary choice of transformations.
Final, final point: For a sequence of window lengths, simply loop over the entire code block for each window length.
I have managed to produce a solution, which only uses basic functions within MATLAB and can also be expanded to include other functions, (for finance: e.g. a moving Sharpe Ratio, or a moving Sortino Ratio). The code below shows this and contains hopefully sufficient commentary.
I am using a time series of Hedge Fund data, with ca. 10 years worth of daily returns (which were checked to be stationary - not shown in the code). Unfortunately I haven't got the corresponding dates in the example so the x-axis in the plots would be 'no. of days'.
% start by importing the data you need - here it is a selection out of an
% excel spreadsheet
returnsHF = xlsread('HFRXIndices_Final.xlsx','EquityHedgeMarketNeutral','D1:D2742');
% two years to be used for the moving average. (250 business days in one year)
window = 500;
% create zero-matrices to fill with the MA values at each point in time.
mean_avg = zeros(length(returnsHF)-window,1);
st_dev = zeros(length(returnsHF)-window,1);
skew = zeros(length(returnsHF)-window,1);
kurt = zeros(length(returnsHF)-window,1);
% Now work through the time-series with each of the functions (one can add
% any other functions required), assinging the values to the zero-matrices
for count = window:length(returnsHF)
% This is the most tricky part of the script, the indexing in this section
% The TwoYearReturn is what is shifted along one period at a time with the
% for-loop.
TwoYearReturn = returnsHF(count-window+1:count);
mean_avg(count-window+1) = mean(TwoYearReturn);
st_dev(count-window+1) = std(TwoYearReturn);
skew(count-window+1) = skewness(TwoYearReturn);
kurt(count-window +1) = kurtosis(TwoYearReturn);
end
% Plot the MAs
subplot(4,1,1), plot(mean_avg)
title('2yr mean')
subplot(4,1,2), plot(st_dev)
title('2yr stdv')
subplot(4,1,3), plot(skew)
title('2yr skewness')
subplot(4,1,4), plot(kurt)
title('2yr kurtosis')

MATLAB: n-minute/hour/day averages of a time-series

This is a follow-up to an earlier question of mine posted here. Based on Oleg Komarov's answer I wrote a little tool to get daily, hourly, etc. averages or sums of my data that uses accumarray() and datevec()'s output structure. Feel free to have a look at it here (it's probably not written very well, but it works for me).
What I would like to do now is add the functionality to calculate n-minute, n-hour, n-day, etc. statistics instead of 1-minute, 1-hour, 1-day, etc. like my function does. I have a rough idea that simply loops over my time-vector t (which would be pretty much what I would have done already if I hadn't learnt about the beautiful accumarray()), but that means I have to do a lot of error-checking for data gaps, uneven sampling times, etc.
I wonder if there is a more elegant/efficient approach that lets me re-use/extend my old function posted above, i.e. something that still makes use of accumarray() and datevec(), since this makes working with gaps very easy.
You can download some sample data taken from my last question here. These were sampled at 30 min intervals, so a possible example of what I want to do would be to calculate 6 hour averages without relying on the assumption that they are free of gaps and/or always sampled at exactly 30 min.
This is what I have come up with so far, which works reasonably well, apart from a small but easily fixed problem with the time stamps (e.g. 0:30 is representative for the interval from 0:30 to 0:45 -- my old function suffers from the same problem, though):
[ ... see my answer below ...]
Thanks to woodchips for inspiration.
The linked method of using accumarray seems overkill and too complex to me if you start with evenly spaced measurements without any gaps. I have the following function in my private toolbox for calculating an N-point average of vectors:
function y = blockaver(x, n)
% y = blockaver(x, n)
% input points are averaged over n points
% always returns column vector
if n == 1
y = x(:);
else
nblocks = floor(length(x) / n);
y = mean(reshape(x(1:n * nblocks), n, nblocks), 1).';
end
Works pretty well for quick and dirty decimating by a factor N, but note that it does not apply proper anti-alias filtering. Use decimate if that is important.
I guess I figured it out using parts of #Bas Swinckels answer and #woodchip 's code linked above. Not exactly what I would call good code, but working and reasonably fast.
function [ t_acc, x_acc, subs ] = ts_aggregation( t, x, n, target_fmt, fct_handle )
% t is time in datenum format (i.e. days)
% x is whatever variable you want to aggregate
% n is the number of minutes, hours, days
% target_fmt is 'minute', 'hour' or 'day'
% fct_handle can be an arbitrary function (e.g. #sum)
t = t(:);
x = x(:);
switch target_fmt
case 'day'
t_factor = 1;
case 'hour'
t_factor = 1 / 24;
case 'minute'
t_factor = 1 / ( 24 * 60 );
end
t_acc = ( t(1) : n * t_factor : t(end) )';
subs = ones(length(t), 1);
for i = 2:length(t_acc)
subs(t > t_acc(i-1) & t <= t_acc(i)) = i;
end
x_acc = accumarray( subs, x, [], fct_handle );
end
/edit: Updated to a much shorter fnction that does use loops, but appears to be faster than my previous solution.