Apply an equation over a data series without a for loop - matlab

I need to apply the following formulation over a set of huge data series (~525600 x 10).
The formulation is:
I've applied the following code to apply the formulation, but it is tooooo slow:
ntr=10; % number of different sets of data series
phi=90; % days
phivec=1:1/24:2*phi+1; % for hourly data
cte=sum(10.^(phivec./phi));
phivec=repmat(phivec',[1,ntr]);
for i = 2*24*phi+1:nti % size(Omega)=>(ntr,nti)
OmegaEQ(i,:)=sum(Omega(i-24*phi*2:i,:).*10.^(phivec./phi))./cte;
end
Can someone help me to to it faster?

Related

K-Means on temporal dataset

I have a temporal dataset(1000000x70) consisting of info about the activities of 20 subjects. I need to apply subsampling to the dataset as it has more than a million rows. How to select a set of observations of each subject ideally from it? Later, I need to apply PCA and K-means on it. Kindly help me with the steps to be followed. I'm working in MATLAB.
I'm not really clear on what you're looking for. If you just want to subsample a matrix on matlab, here is a way to do it:
myData; % 70 x 1000000 data
nbDataPts = size(myData, 2); % Get the number of points in the data
subsampleRatio = 0.1; % Ratio of data you want to keep
nbSamples = round(subsampleRatio * nbDataPts); % How many points to keep
sampleIdx = round(linspace(1, nbDataPts, nbSamples)); % Evenly space indices of the points to keep
sampledData = myData(:, sampleIdx); % Sampling data
Then if you want to apply PCA and K means I suggest you take a look at the relevant documentation:
PCA
K means
Try to work with it, and open a new question if a specific problem arises.

Problem with defining a transfer function for Bode plot in MATLAB

I am trying to tune a PID controller using Matlab(not Simulink because I am learning/uni coursework).
1. Summarize the problem:
So, I have a transfer function of a system for which there are phase margin requirement that needs to met
In order to find the phase advance part of the PID I need to solve a bunch of equations to plot a Bode plot using the variables calculated
2.Describe what I've tried
I tried to replace the tf([num],[den]) with real numbers but that is not feasible as it defeats the purpose of doing this, I want Matlab to calculate the GR and frequency and substitute that into the tf
Problem
Full_Code:
https://drive.google.com/file/d/1sWUnvvye_RBXGL8-nWq___3F5UDmDOoG/view?usp=sharing
Minimum reproducible code example:
clearvars;clc;clearAllMemoizedCaches;clear
syms s w
%--------------TF of the aircraft
G(s)= (160*(s+2.5)*(s+0.7))/((s^2+5*s+40)*(s^2+0.03*s+0.06));
k= 8; % selected k value range 4<k<8
Max_PA=asind((k-1)/(k+1)); % computes max phase advance
Centre_dB= 20*log10(sqrt(k)); % computing centre gain in dB
Poi= -120-Max_PA % looking for Point of interest(Poi)
tf_int= subs(G(s),1j*w); %intermediate transfer function
eqn= atan2d(imag(tf_int),real(tf_int))==Poi; % solve for w at Poi
% computing crossover freq(wc)
wc= vpasolve(eqn,w); % find exactly the wc at Poi
GR=20*log10(abs(subs(tf_int,w,wc))); % find the gain at at wc
Kpa= 10^((GR-Centre_dB)/20);
ti= 1/(sqrt(k)*wc); % computing Kpa and ti
num1= [Kpa*k*ti,Kpa];
den2= [ti,1];
PA= tf(num1,den2) %PA tf defined
Yo are trying to input non-numerical (symbolic numers) values into tf, which only accepts numerical arrays. You can convert them to that with double()
PA= tf(double(num1),double(den2)) %PA tf defined

Plotting Derivative of data in matlab

I am pretty new to Matlab and i have some Current Vs times stored under a structure in a matlab file.
What i am trying to plot is current vs time along with the first derivative of it. (di/dt). I used the diff function but the plot seems to be really wierd.
I know it simple but can anyone explain it.
THanks in advance.
Assume you have a structure S,
S.t is the time vector and S.I is the current vector in each time in S.t.
(both should be in the same length N).
Now, if you want to approximate the derivative:
dt = diff(S.t); % dt is the time intervals length, dt is N-1 length.
dI = diff(S.I);
derivative = dI./dt; %derivative is memberwise division of dI by dt
plot(t(1:end-1),derivative); % when you plot both vector should be in the same length:
% t(1:end-1) is the same as t except the last coordinate
I think this should work

Naïve Bayes Classifier -- is normalization necessary?

We recently studied the Naïve Bayesian Classifier in our Machine Learning class and now I'm trying to implement it on the Fisher Iris dataset as a self-exercise. The concept is easy and straightforward, with some trickiness involved for continuous attributes. I read up several literature resources which recommended using a Gaussian approximation to compute probability of test data values, so I'm going with it in my code.
Now I'm trying to run it initially for 50% training and 50% test data samples, but something is missing. The current code is always predicting class 1 (I used integers to represent the classes) for all test samples, which is obviously wrong.
My guess is that the problem may be due to normalization being omitted by the code? Though I think adding normalization would still yield proportionate results, and so far my attempts to normalize have produced the same classification results.
Can someone please suggest if there is anything obvious missing here? Or if I'm not approaching this right? Since most of the code is 'mechanics', I have made prominent (****************) the 2 lines that are responsible for the calculations. Any help is appreciated, thanks!
nsamples=75; % 50% samples
% acquire training set and test set
[trainingSample,idx] = datasample(data,nsamples,'Replace',false);
testData = data(setdiff(1:150,idx),:);
% define Gaussian function
%***********************************************************%
Phi=#(mu,sig2,x) (1/sqrt(2*pi*sig2))*exp(-((x-mu)^2)/2*sig2);
%***********************************************************%
for c=1:3 % for 3 classes in training set
clear y x mu sig2;
index=1;
for i=1 : length(trainingSample)
if trainingSample(i,5)==c
y(index,:)=trainingSample(i,:); % filter current class samples
index=index+1; % for conditional probabilities
end
end
for j=1:size(testData,1) % iterate over test samples
clear pf p;
for i=1:4 % iterate over columns
x=testData(j,i); % representing attributes
mu=mean(y(:,i));
sig2=var(y(:,i));
pf(i) = Phi(mu,sig2,x); % calc conditional probability
end
% calc class likelihood; prior * posterior
%*****************************************************%
pc(j,c) = size(y,1)/nsamples * pf(1)*pf(2)*pf(3)*pf(4);
%*****************************************************%
end
end
% find the predicted class for each test sample
% by taking the max probability calculated
for i=1:size(pc,1)
[~,q]=max(pc(i,:));
predicted(i)=q;
actual(i)=testData(i,5);
end
Normalization shouldn't be necessary since the features are only compared to each other.
p(class|thing) = p(class)p(thing|class) =
= p(class)p(feature_1|class)p(feature_2|class)...p(feature_N|class)
So when fitting the parameters for the distribution feature_i|class it will just rescale the parameters (for the new "scale") in this case (mu, sigma2), but the probabilities will remain the same.
It's hard to read the matlab code due to alot of indexing and splitting of training/testing etc. Which is a possible problem source.
You should try something with a lot less non-necessary stuff around it (I would recommend python with scikit-learn for example, alot of helpers for splitting data and such http://scikit-learn.org/).
It's really important that you separate the training and test data, and only train the model with training data and test the trained model with the test data. (Is this done?)
Next step is to check the parameters which is easiest done with either printing them out (sanity check) or..
for each feature render the gaussian bells fitted next to a histogram of the data to see that they match (remember that each histogram bar must be of height number_of_samples_within_range/total_number_of_samples.
Visualising the data and the model is really important to know what is happening.

Fast fourier transform for deasonalizing data in MATLAB

I'm very much a novice at signal processing techniques, but I am trying to apply the fast fourier transform to a daily time series to remove the seasonality present in the data. The example I am working with is from here:
http://www.mathworks.com/help/signal/ug/frequency-domain-linear-regression.html
While I understand how to implement the code as it is written in the example, I am having a hard time adapting it to my specific application. What I am trying to do is create a preprocessing function which deseasonalizes the training data using similar code to the above example. Then, using the same estimated coefficients from the in-sample data, deseasonalize the out-of-sample data to preserve its independence from the in-sample data. Basically, once the coefficients are estimated, I will normalize each new data point using the same coefficients. I suspect this is akin to estimating a linear trend, then removing it from the in-sample data, and then using the same linear model on unseen data to detrend it i the same manner.
Obviously, when I estimate the fourier coefficients, the vector I get out is equal to the length of the in-sample data. The out-of-sample data is comprised of much fewer observations, so directly applying them is impossible.
Is this sort of analysis possible using this technique or am I going down a dead end road? How should I approach that using the code in the example above?
What you want to do is certainly possible, you are on the right track, but you seem to misunderstand a few points in the example. First, it is shown in the example that the technique is the equivalent of linear regression in the time domain, exploiting the FFT to perform in the frequency domain an operation with the same effect. Second, the trend that is removed is not linear, it is equal to a sum of sinusoids, which is why FFT is used to identify particular frequency components in a relatively tidy way.
In your case it seems you are interested in the residuals. The initial approach is therefore to proceed as in the example as follows:
(1) Perform a rough "detrending" by removing the DC component (the mean of the time-domain data)
(2) FFT and inspect the data, choose frequency channels that contain most of the signal.
You can then use those channels to generate a trend in the time domain and subtract that from the original data to obtain the residuals. You need not proceed by using IFFT, however. Instead you can explicitly sum over the cosine and sine components. You do this in a way similar to the last step of the example, which explains how to find the amplitudes via time-domain regression, but substituting the amplitudes obtained from the FFT.
The following code shows how you can do this:
tim = (time - time0)/timestep; % <-- acquisition times for your *new* data, normalized
NFpick = [2 7 13]; % <-- channels you picked to build the detrending baseline
% Compute the trend
mu = mean(ts);
tsdft = fft(ts-mu);
Nchannels = length(ts); % <-- size of time domain data
Mpick = 2*length(NFpick);
X(:,1:2:Mpick) = cos(2*pi*(NFpick-1)'/Nchannels*tim)';
X(:,2:2:Mpick) = sin(-2*pi*(NFpick-1)'/Nchannels*tim)';
% Generate beta vector "bet" containing scaled amplitudes from the spectrum
bet = 2*tsdft(NFpick)/Nchannels;
bet = reshape([real(bet) imag(bet)].', numel(bet)*2,1)
trend = X*bet + mu;
To remove the trend just do
detrended = dat - trend;
where dat is your new data acquired at times tim. Make sure you define the time origin consistently. In addition this assumes the data is real (not complex), as in the example linked to. You'll have to examine the code to make it work for complex data.