Related
Trying to undertsand how MATLAB does resampling, looking at toolbox/signal/resample.m.
This is the part where low-pass FIR filter coefficients are calculated:
fc = 1/2/pqmax;
L = 2*N*pqmax + 1;
h = firls( L-1, [0 2*fc 2*fc 1], [1 1 0 0]).*kaiser(L,bta)' ;
h = p*h/sum(h);
(In this case pqmax and p both represent downsampling ratio.)
Not sure what is the purpose of last line where all filter coefficients are scaled by downsampling ratio over coeffs sum, p/sum(h)? Does anyone know the theory behind?
MATLAB firls
MATLAB kaiser
firls theory: Least Squared Error Design of FIR Filters
In most filters, you want the output magnitude to remain equal after filtering, not scaled up or down.
Mathematically this is generally properly done in the filter equation, but in discrete numbers this doesnt generally doesn't apply.
If you divide the filer by its sum, you make sure that for any input data p, the filter will always have a general scale factor of 1, as its normalized. For a p=ones(1:1000,1), if sum(h) is not 1, the result after filtering will be scaled, i.e. the values of p_filtered will not be 1.
I'm kind've new to Matlab and stack overflow to begin with, so if I do something wrong outside of the guidelines, please don't hesitate to point it out. Thanks!
I have been trying to do convolution between two functions and I have been having a hard time trying to get it to work.
t=0:.01:10;
h=exp(-t);
x=zeros(size(t)); % When I used length(t), I would get an error that says in conv(), A and B must be vectors.
x(1)=2;
x(4)=5;
y=conv(h,x);
figure; subplot(3,1,1);plot(t,x); % The discrete function would not show (at x=1 and x=4)
subplot(3,1,2);plot(t,h);
subplot(3,1,3);plot(t,y(1:length(t))); %Nothing is plotted here when ran
I commented my issues with the code. I don't understand the difference of length and size in this case and how it would make a difference.
For the second comment, x=1 should have an amplitude of 2. While x=4 should have an amplitude of 5. When plotted, it only shows nothing in the locations specified but looks jumbled up at x=0. I'm assuming that's the reason why the convoluted plot won't be displayed.
The original problem statement is given if it helps to understand what I was thinking throughout.
Consider an input signal x(t) that consists of two delta functions at t = 1 and t = 4 with amplitudes A1 = 5 and A2 = 2, respectively, to a linear system with impulse response h that is an exponential pulse (h(t) = e ^−t ). Plot x(t), h(t) and the output of the linear system y(t) for t in the range of 0 to 10 using increments of 0.01. Use the MATLAB built-in function conv.
The initial question regarding size vs length
length yields a scalar that is equal to the largest dimension of the input. In the case of your array, the size is 1 x N, so length yields N.
size(t)
% 1 1001
length(t)
% 1001
If you pass a scalar (N) to ones, zeros, or a similar function, it will create a square matrix that is N x N. This results in the error that you see when using conv since conv does not accept matrix inputs.
size(ones(length(t)))
% 1001 1001
When you pass a vector to ones or zeros, the output will be that size so since size returns a vector (as shown above), the output is the same size (and a vector) so conv does not have any issues
size(ones(size(t)))
% 1 1001
If you want a vector, you need to explicitly specify the number of rows and columns. Also, in my opinion, it's better to use numel to the number of elements in a vector as it's less ambiguous than length
z = zeros(1, numel(t));
The second question regarding the convolution output:
First of all, the impulses that you create are at the first and fourth index of x and not at the locations where t = 1 and t = 4. Since you create t using a spacing of 0.01, t(1) actually corresponds to t = 0 and t(4) corresponds to t = 0.03
You instead want to use the value of t to specify where to put your impulses
x(t == 1) = 2;
x(t == 4) = 5;
Note that due to floating point errors, you may not have exactly t == 1 and t == 4 so you can use a small epsilon instead
x(abs(t - 1) < eps) = 2;
x(abs(t - 4) < eps) = 5;
Once we make this change, we get the expected scaled and shifted versions of the input function.
I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am using the following sigmoid function:
t = 1 ./ (1 + exp(-z));
where
z = x*theta
And I am using the following cost function to calculate cost, to determine when to stop training.
function cost = computeCost(x, y, theta)
htheta = sigmoid(x*theta);
cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));
end
I am getting the cost at each step to be NaN as the values of htheta are either 1 or zero in most cases. What should I do to determine the cost value at each iteration?
This is the gradient descent code for logistic regression:
function [theta,cost_history] = batchGD(x,y,theta,alpha)
cost_history = zeros(1000,1);
for iter=1:1000
htheta = sigmoid(x*theta);
new_theta = zeros(size(theta,1),1);
for feature=1:size(theta,1)
new_theta(feature) = theta(feature) - alpha * sum((htheta - y) .*x(:,feature))
end
theta = new_theta;
cost_history(iter) = computeCost(x,y,theta);
end
end
There are two possible reasons why this may be happening to you.
The data is not normalized
This is because when you apply the sigmoid / logit function to your hypothesis, the output probabilities are almost all approximately 0s or all 1s and with your cost function, log(1 - 1) or log(0) will produce -Inf. The accumulation of all of these individual terms in your cost function will eventually lead to NaN.
Specifically, if y = 0 for a training example and if the output of your hypothesis is log(x) where x is a very small number which is close to 0, examining the first part of the cost function would give us 0*log(x) and will in fact produce NaN. Similarly, if y = 1 for a training example and if the output of your hypothesis is also log(x) where x is a very small number, this again would give us 0*log(x) and will produce NaN. Simply put, the output of your hypothesis is either very close to 0 or very close to 1.
This is most likely due to the fact that the dynamic range of each feature is widely different and so a part of your hypothesis, specifically the weighted sum of x*theta for each training example you have will give you either very large negative or positive values, and if you apply the sigmoid function to these values, you'll get very close to 0 or 1.
One way to combat this is to normalize the data in your matrix before performing training using gradient descent. A typical approach is to normalize with zero-mean and unit variance. Given an input feature x_k where k = 1, 2, ... n where you have n features, the new normalized feature x_k^{new} can be found by:
m_k is the mean of the feature k and s_k is the standard deviation of the feature k. This is also known as standardizing data. You can read up on more details about this on another answer I gave here: How does this code for standardizing data work?
Because you are using the linear algebra approach to gradient descent, I'm assuming you have prepended your data matrix with a column of all ones. Knowing this, we can normalize your data like so:
mX = mean(x,1);
mX(1) = 0;
sX = std(x,[],1);
sX(1) = 1;
xnew = bsxfun(#rdivide, bsxfun(#minus, x, mX), sX);
The mean and standard deviations of each feature are stored in mX and sX respectively. You can learn how this code works by reading the post I linked to you above. I won't repeat that stuff here because that isn't the scope of this post. To ensure proper normalization, I've made the mean and standard deviation of the first column to be 0 and 1 respectively. xnew contains the new normalized data matrix. Use xnew with your gradient descent algorithm instead. Now once you find the parameters, to perform any predictions you must normalize any new test instances with the mean and standard deviation from the training set. Because the parameters learned are with respect to the statistics of the training set, you must also apply the same transformations to any test data you want to submit to the prediction model.
Assuming you have new data points stored in a matrix called xx, you would do normalize then perform the predictions:
xxnew = bsxfun(#rdivide, bsxfun(#minus, xx, mX), sX);
Now that you have this, you can perform your predictions:
pred = sigmoid(xxnew*theta) >= 0.5;
You can change the threshold of 0.5 to be whatever you believe is best that determines whether examples belong in the positive or negative class.
The learning rate is too large
As you mentioned in the comments, once you normalize the data the costs appear to be finite but then suddenly go to NaN after a few iterations. Normalization can only get you so far. If your learning rate or alpha is too large, each iteration will overshoot in the direction towards the minimum and would thus make the cost at each iteration oscillate or even diverge which is what is appearing to be happening. In your case, the cost is diverging or increasing at each iteration to the point where it is so large that it can't be represented using floating point precision.
As such, one other option is to decrease your learning rate alpha until you see that the cost function is decreasing at each iteration. A popular method to determine what the best learning rate would be is to perform gradient descent on a range of logarithmically spaced values of alpha and seeing what the final cost function value is and choosing the learning rate that resulted in the smallest cost.
Using the two facts above together should allow gradient descent to converge quite nicely, assuming that the cost function is convex. In this case for logistic regression, it most certainly is.
Let's assume you have an observation where:
the true value is y_i = 1
your model is quite extreme and says that P(y_i = 1) = 1
Then your cost function will get a value of NaN because you're adding 0 * log(0), which is undefined. Hence:
Your formula for the cost function has a problem (there is a subtle 0, infinity issue)!
As #rayryeng pointed out, 0 * log(0) produces a NaN because 0 * Inf isn't kosher. This is actually a huge problem: if your algorithm believes it can predict a value perfectly, it incorrectly assigns a cost of NaN.
Instead of:
cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));
You can avoid multiplying 0 by infinity by instead writing your cost function in Matlab as:
y_logical = y == 1;
cost = sum(-log(htheta(y_logical))) + sum( - log(1 - htheta(~y_logical)));
The idea is if y_i is 1, we add -log(htheta_i) to the cost, but if y_i is 0, we add -log(1 - htheta_i) to the cost. This is mathematically equivalent to -y_i * log(htheta_i) - (1 - y_i) * log(1- htheta_i) but without running into numerical problems that essentially stem from htheta_i being equal to 0 or 1 within the limits of double precision floating point.
It happened to me because an indetermination of the type:
0*log(0)
This can happen when one of the predicted values Y equals either 0 or 1.
In my case the solution was to add an if statement to the python code as follows:
y * np.log (Y) + (1-y) * np.log (1-Y) if ( Y != 1 and Y != 0 ) else 0
This way, when the actual value (y) and the predicted one (Y) are equal, no cost needs to be computed, which is the expected behavior.
(Notice that when a given Y is converging to 0 the left addend is canceled (because of y=0) and the right addend tends toward 0. The same happens when Y converges to 1, but with the opposite addend.)
(There is also a very rare scenario, which you probably won't need to worry about, where y=0 and Y=1 or viceversa, but if your dataset is standarized and the weights are properly initialized it won't be an issue.)
I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am using the following sigmoid function:
t = 1 ./ (1 + exp(-z));
where
z = x*theta
And I am using the following cost function to calculate cost, to determine when to stop training.
function cost = computeCost(x, y, theta)
htheta = sigmoid(x*theta);
cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));
end
I am getting the cost at each step to be NaN as the values of htheta are either 1 or zero in most cases. What should I do to determine the cost value at each iteration?
This is the gradient descent code for logistic regression:
function [theta,cost_history] = batchGD(x,y,theta,alpha)
cost_history = zeros(1000,1);
for iter=1:1000
htheta = sigmoid(x*theta);
new_theta = zeros(size(theta,1),1);
for feature=1:size(theta,1)
new_theta(feature) = theta(feature) - alpha * sum((htheta - y) .*x(:,feature))
end
theta = new_theta;
cost_history(iter) = computeCost(x,y,theta);
end
end
There are two possible reasons why this may be happening to you.
The data is not normalized
This is because when you apply the sigmoid / logit function to your hypothesis, the output probabilities are almost all approximately 0s or all 1s and with your cost function, log(1 - 1) or log(0) will produce -Inf. The accumulation of all of these individual terms in your cost function will eventually lead to NaN.
Specifically, if y = 0 for a training example and if the output of your hypothesis is log(x) where x is a very small number which is close to 0, examining the first part of the cost function would give us 0*log(x) and will in fact produce NaN. Similarly, if y = 1 for a training example and if the output of your hypothesis is also log(x) where x is a very small number, this again would give us 0*log(x) and will produce NaN. Simply put, the output of your hypothesis is either very close to 0 or very close to 1.
This is most likely due to the fact that the dynamic range of each feature is widely different and so a part of your hypothesis, specifically the weighted sum of x*theta for each training example you have will give you either very large negative or positive values, and if you apply the sigmoid function to these values, you'll get very close to 0 or 1.
One way to combat this is to normalize the data in your matrix before performing training using gradient descent. A typical approach is to normalize with zero-mean and unit variance. Given an input feature x_k where k = 1, 2, ... n where you have n features, the new normalized feature x_k^{new} can be found by:
m_k is the mean of the feature k and s_k is the standard deviation of the feature k. This is also known as standardizing data. You can read up on more details about this on another answer I gave here: How does this code for standardizing data work?
Because you are using the linear algebra approach to gradient descent, I'm assuming you have prepended your data matrix with a column of all ones. Knowing this, we can normalize your data like so:
mX = mean(x,1);
mX(1) = 0;
sX = std(x,[],1);
sX(1) = 1;
xnew = bsxfun(#rdivide, bsxfun(#minus, x, mX), sX);
The mean and standard deviations of each feature are stored in mX and sX respectively. You can learn how this code works by reading the post I linked to you above. I won't repeat that stuff here because that isn't the scope of this post. To ensure proper normalization, I've made the mean and standard deviation of the first column to be 0 and 1 respectively. xnew contains the new normalized data matrix. Use xnew with your gradient descent algorithm instead. Now once you find the parameters, to perform any predictions you must normalize any new test instances with the mean and standard deviation from the training set. Because the parameters learned are with respect to the statistics of the training set, you must also apply the same transformations to any test data you want to submit to the prediction model.
Assuming you have new data points stored in a matrix called xx, you would do normalize then perform the predictions:
xxnew = bsxfun(#rdivide, bsxfun(#minus, xx, mX), sX);
Now that you have this, you can perform your predictions:
pred = sigmoid(xxnew*theta) >= 0.5;
You can change the threshold of 0.5 to be whatever you believe is best that determines whether examples belong in the positive or negative class.
The learning rate is too large
As you mentioned in the comments, once you normalize the data the costs appear to be finite but then suddenly go to NaN after a few iterations. Normalization can only get you so far. If your learning rate or alpha is too large, each iteration will overshoot in the direction towards the minimum and would thus make the cost at each iteration oscillate or even diverge which is what is appearing to be happening. In your case, the cost is diverging or increasing at each iteration to the point where it is so large that it can't be represented using floating point precision.
As such, one other option is to decrease your learning rate alpha until you see that the cost function is decreasing at each iteration. A popular method to determine what the best learning rate would be is to perform gradient descent on a range of logarithmically spaced values of alpha and seeing what the final cost function value is and choosing the learning rate that resulted in the smallest cost.
Using the two facts above together should allow gradient descent to converge quite nicely, assuming that the cost function is convex. In this case for logistic regression, it most certainly is.
Let's assume you have an observation where:
the true value is y_i = 1
your model is quite extreme and says that P(y_i = 1) = 1
Then your cost function will get a value of NaN because you're adding 0 * log(0), which is undefined. Hence:
Your formula for the cost function has a problem (there is a subtle 0, infinity issue)!
As #rayryeng pointed out, 0 * log(0) produces a NaN because 0 * Inf isn't kosher. This is actually a huge problem: if your algorithm believes it can predict a value perfectly, it incorrectly assigns a cost of NaN.
Instead of:
cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));
You can avoid multiplying 0 by infinity by instead writing your cost function in Matlab as:
y_logical = y == 1;
cost = sum(-log(htheta(y_logical))) + sum( - log(1 - htheta(~y_logical)));
The idea is if y_i is 1, we add -log(htheta_i) to the cost, but if y_i is 0, we add -log(1 - htheta_i) to the cost. This is mathematically equivalent to -y_i * log(htheta_i) - (1 - y_i) * log(1- htheta_i) but without running into numerical problems that essentially stem from htheta_i being equal to 0 or 1 within the limits of double precision floating point.
It happened to me because an indetermination of the type:
0*log(0)
This can happen when one of the predicted values Y equals either 0 or 1.
In my case the solution was to add an if statement to the python code as follows:
y * np.log (Y) + (1-y) * np.log (1-Y) if ( Y != 1 and Y != 0 ) else 0
This way, when the actual value (y) and the predicted one (Y) are equal, no cost needs to be computed, which is the expected behavior.
(Notice that when a given Y is converging to 0 the left addend is canceled (because of y=0) and the right addend tends toward 0. The same happens when Y converges to 1, but with the opposite addend.)
(There is also a very rare scenario, which you probably won't need to worry about, where y=0 and Y=1 or viceversa, but if your dataset is standarized and the weights are properly initialized it won't be an issue.)
I am trying to graph the frequency response. I was asked to use filter in MATLAB, but since I've read the manual, I still don't get how it performs a Z transform.
I have the impulse response of the digital filter written below:
for i=1:22;
y(i)= 0;
end
x(22) = 1;
for k=23:2523
x(k) = 0;
end
for n = 22:2522;
y(n) = ((1/21)*x(n))+((20/21)*y(n-21));
end
plot(y);
It's just a feedback system of y[n] = 1/21*x[n] + 20/21*y[n-21]
Below are my calculations to compute the Z transform of the above system, which ultimately determines the impulse response:
Z(y) = Z((1/21)*x(n)+(20/21)*y(n-21))
Y(Z) = (1/21)X(Z)+(20/21)*Z.^-21Y(Z)
Z(Z)-(20/21)*Z.^-21Y(Z) = (1/21)X(Z)
Y(Z)(1-(20/21)*Z.^-21) = (1/21)X(Z) // divide by X(Z)*(1-(20/21)*Z.^-21)
Y(Z)/X(Z) = (1/21)/(1-(20/21)*Z.^-21)
H(Z) = (1/21)/(1-(20/21)*Z.^-21) // B = 1/21, A = 20/21
H(Z) = (B*Z.^21)/(Z.^21-A)
How can I plot the frequency response of H(Z)? Should I use filter?
If you just need to plot the impulse response it's easy. The impulse response is the response of the digital filter to a Dirac pulse. You already have the difference equation, so you're already in 'z' and you don't care about the 's', you don't have to perform the 's' to 'z' transform (which is a topic in itself!).
So just generate a signal x(n) consisting of zeros everywhere except the first sample x(1) being 1. Pass it through the filter (yes, your difference equation). The y(n) you get is your impulse response h(n). That's basically what you did.
And of course if you FFT this h(n) you get the phase and magnitude response.
Use freqz from the signal processing toolbox (hope you have it). You first need to find the Z-transform, which you have already done here:
Y(Z)/X(Z) = (1/21)/(1-(20/21)*Z.^-21)
freqz takes in a vector of coefficients which correspond to the numerator and denominator of your transfer function. It's called like so:
freqz(b, a);
b and a are the numerator and denominator coefficients of your transfer function. This will produce a figure that shows the magnitude and phase response (hence frequency response) of the above system.
Therefore, all you need is to do this:
b = 1/21;
a = [1 zeros(1,20) -(20/21)];
freqz(b, a)
Take special note of the a vector. It has 1, followed by 20 zeros, then followed by -(20/21). Because you have a coefficient to the power of -21 and nothing else other than the 1 before it, that means that those coefficients between -1 to -20 are zero, and there are 20 of these coefficients that are zero in total, which is why we need to fill in the vector with zeroes between the 1 and -(20/21) term.
We get:
If you want to plot the poles and zeroes of your filter, use a combination of tf and pzmap:
sys = tf(b, a, -1);
pzmap(sys);
tf creates a transfer function by specifying the numerator and denominator coefficients of your filter, and -1 implies it's a discrete-time filter, but we don't know what the sampling time is. pzmap plots the poles and zeroes in the z-domain with the unit-circle overlaid.
We get this:
This makes sense as you have no zeroes in your system, and 21 poles, as you have 21 delay elements when examining the discrete-time sequence in your example.
From Matlab's filter documentation:
filters the input data, x, using a rational transfer function defined by the numerator and denominator coefficients b and a, respectively.
As you might know, filtering an impulse input would give you the impulse response. It then remains to obtain those b and a coefficients.
You could either obtain those directly from the difference equation
y[n] = 1/21*x[n] + 20/21*y[n-21];
(as indicated in the rational transfer function link above) or equivalently from the rational transfer function you have derived:
%H(Z) = (B*Z.^21)/(Z.^21-A)
In either case, you should get the following a and b coefficients:
a = [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -20/21];
% or equivalently: a=zeros(22,1); a(1)=1; a(22)=-20/21;
b = [1/21];
Thus,
% setup the impulse input
x = zeros(2500,1);
x(1) = 1;
% compute the impulse response
y = filter(b, a, x);
This impulse response can be plotted as you have done:
plot(y);
As far as how this related to the transform H(Z), the closed form expression you obtained can be evaluated in terms of a Laurent series expansion, which then has the coefficients of the time-domain y (impulse response) series.
However, H(z) is an analytic function defined in the |z| > R (where R=power(20/21,1/21) in your case) region of convergence in the complex plane. It is more typical to plot the frequency response, which corresponds to the H(z) evaluated on the unit-circle (i.e. for complex number satisfying |z|=1 or equivalently z = exp(j * theta) with theta in the [0-2pi] range). An efficient method to compute values of H(z) at regularly spaced points on that unit-circle is to take the FFT of the impulse response:
FrequencyResponse = fft(y);
figure(1);
plot(abs(FrequencyResponse));
figure(2);
plot(phase(FrequencyResponse));
P.S.: the computation of filter and fft can be done in the single call freqz if you have the signal processing toolbox (although you were specifically asked to use filter).