I want to fit a cubic spline in Python to noisy x, y data and extract the spline coefficients for each interval (i.e. I would expect to obtain four spline coefficients for each interval)
So far, I have tried (all from scipy.interpolate):
1) CubicSpline, but this method does not allow me to smooth the spline, resulting in unrealistic, jumpy coefficient data.
2) Combining splrep and splev, e.g.
tck = splrep(x, y, k=3, s=1e25)
where I extract the coefficients/knots using
F = PPoly.from_spline(tck)
coeffs = F.c
knots = F.x
However, I cannot find smooth coefficients over the full x-range (jumps between values close to zero and 1e23, which is unphysical) even if I ramp up the smoothing parameter s to very large numbers that ultimately lead to too small numbers of knots since the number of knots decreases with s. It seems that I cannot find a suitable parameter s and number of knots at the same time.
3) I used
UnivariateSpline(x, y, k=3, s=0.03)
Here, I found a better sensitivity to changing s, but the corresponding get_coeffs() method does not provide 4 coefficients for each interval but only one, which I do not understand.
4) I also tried a piecewise ridged linear regression with a third order polynomial, but this method provides too large percentage errors for the fit, so it would be great to get one of the standard spline methods working.
What am I missing? Can someone help, please?
The concrete issue I see here is that UnivariateSpline does not yield the algebraic coefficients of various powers of x in the interpolating spline. This is because the coefficients it keeps in the private _data property, which it also returns with get_coeffs method, are a kind of B-spline coefficients. These coefficients describe the spline without any redundancy (you need N of them for a spline with N degrees of freedom), but the basis splines that they are attached to are somewhat complicated.
But you can get the kind of coefficients you want by using the derivatives method of the spline object. It returns all four derivatives at a given point x, from which the Taylor coefficients at that point are easy to find. It is natural to use this method with x being the knots of interpolation, excluding the rightmost one; the coefficients obtained are valid from that knot to the next one. Here is an example, complete with "fancy" formatted output.
import numpy as np
from scipy.interpolate import UnivariateSpline
spl = UnivariateSpline(np.arange(6), np.array([3, 1, 4, 1, 5, 9]), s=0)
kn = spl.get_knots()
for i in range(len(kn)-1):
cf = [1, 1, 1/2, 1/6] * spl.derivatives(kn[i])
print("For {0} <= x <= {1}, p(x) = {5}*(x-{0})^3 + {4}*(x-{0})^2 + {3}*(x-{0}) + {2}".format(kn[i], kn[i+1], *cf))
The knots are 0, 2, 3, 5 in this example. The output is:
For 0.0 <= x <= 2.0, p(x) = -3.1222222222222222*(x-0.0)^3 + 11.866666666666667*(x-0.0)^2 + -10.744444444444445*(x-0.0) + 3.000000000000001
For 2.0 <= x <= 3.0, p(x) = 4.611111111111111*(x-2.0)^3 + -6.866666666666667*(x-2.0)^2 + -0.7444444444444436*(x-2.0) + 4.000000000000001
For 3.0 <= x <= 5.0, p(x) = -2.322222222222221*(x-3.0)^3 + 6.966666666666665*(x-3.0)^2 + -0.6444444444444457*(x-3.0) + 1.0000000000000016
Note that for each piece, cf holds the coefficients starting with the lowest degree, so the order is reversed when formatting the string.
(Of course, you'd probably want to do something else with these coefficients)
To check that the formulas are correct, I copy-pasted them for plotting:
Related
I am pretty new to MATLAB computation and would appreciate any help on this. Firstly, I am trying to integrate a cosine function using the McLaurin expansion [cos(z) = 1 − (z^2)/2 + (z^4)/4 + ....] and lets say plotted over one cycle from 0 to 2π. This first integral (as seen in Figure 1) would serve as a "reference" or "contour" for what I want to do next.
Figure 1 - "The first integral representation"
Now, my next problem comes from writing in MATLAB cos(z) in
terms of an integral in the complex plane.
Figure 2 - "showing cos(z) as an integral in the complex plane"
Where I could choose an equi-sampled set of points 'sn' along the contour, from 'sL' to 'sU'
and separated by '∆s'. This is Euler’s method.
I am trying to write a code to numerically approximate the integral
along a suitable contour and plot the approximation against the
exact value of cos(z). I would like to do this twice - once for
z ∈ [0, 6π] and once for complex valued z in the range
z ∈ [0 + i, 6π + ]. Then to plot both the real and imaginary part of the
computed cos(z)
I am aware of the steps I am looking to implement which I'll bullet-point here.
Choose γ, SL, SU , N.
Step through z from z lower to z upper (use a different
number of steps (other than N) for this).
For each value of z compute cos z by stepping along the contour in
N discrete steps from SL to SU .
For each value of sn along the contour compute the integrand e^(sn-(z^2/4sn))/sqrt(sn) and add it to the rolling sum [I have attached figure 3 showing an image formula of the integrand if its not clear!] Figure 3 - "The exponential integrand I am looking to compute"
Now I will show what I have attempted in MATLAB!
% "Contour Integral Representation of Cosine"
% making 'z' a possible number such as pi
N = 10000000; % example number - meaning sample of steps
z_lower = 0;
z_upper = 6*pi;
%==========================%
z1 = linspace(z_lower,z_upper,N);
y = 1;
Sl = y - 10*1i;
sum = 0.0;
%==========================%
for z = linspace(z_lower,z_upper,N)
for Sn = linspace(Sl,Su,N)
sum = sum + ((exp(Sn) - (z.^2/4*Sn))/sqrt(Sn))*ds;
end
end
sum = sum*(sqrt(pi)/2*pi*1i);
plot(Sn,sum)
Edit1: Hiya, this figure will show what I am expecting - the numerical method to be not exactly the same as the "symbolic" integration lets say. In figure 4, the black cosine wave is the same as in figure 1 and the blue is the numerical integration method.Figure 4 - "End Result of what I expect to plot
I'm trying to reproduce a function from a paper, which is specified only in terms of spline knots and coefficients. After finding this on stackoverflow, given a scipy interpolation object, from its knots and coefficients, I can recreate the scipy interpolation. However, the approach fails for the function specified in the paper. To reproduce a scipy interpolation I can do this:
using PyCall, PyPlot, Random
Random.seed!(5)
sp = pyimport("scipy.interpolate")
x = LinRange(0,1,50)
y = (0.9 .+ 0.1rand(length(x))).*sin.(2*pi*(x.-0.5))
t = collect(x[2:2:end-1]) # knots
s1 = sp.LSQUnivariateSpline(x, y, t)
x2 = LinRange(0, 1, 200) # new x-grid
y2 = s1(x2) # evaluate spline on that new grid
figure()
plot(x, y, label="original")
plot(x2, y2, label="interp", color="k")
knots = s1.get_knots()
c = s1.get_coeffs()
newknots(knots, k) = vcat(fill(knots[1],k),knots,fill(knots[end],k)) # func for boundary knots of order k
forscipyknots = newknots(knots, 3)
s2 = sp.BSpline(forscipyknots, c, 3)
y3 = s2(x2)
plot(x2,y3,"--r", label="reconstructed \nfrom knots and coeff")
legend()
Which provides the following as expected:
On trying to reproduce a function (image below) with specified knots = [.4,.4,.4,.4,.7] and coefficients c = [2,-5,5,2,-3,-1,2] which is supposed to produce:
With the below code and above knots and coefficients:
knots = [.4,.4,.4,.4,.7]
c = [2,-5,5,2,-3,-1,2]
forscipyknots = newknots(knots, 3)
s2 = sp.BSpline(forscipyknots, c, 3)
figure()
plot(x2, s2(x2))
I get the following (below) instead. I'm sure I'm messing up the boundary knots - how can I fix this?
Short answer:
The inner-knot sequence t=[0.4,0.4,0.4,0.4,0.7] and the parameters c=[2,-5,5,2,-3,-1,2] do not allow a spline to be constructed, the example contains an error (more on this later). The best you can get out of it is to remove one of the 0.4 knots and construct a quadratic (second-degree) spline as follows
tt = [0.0,0.0,0.0,0.4,0.4,0.4,0.7,1.0,1.0,1.0]
c = [2,-5,5,2,-3,-1,2]
s2 = BSpline(tt,c,2)
This produces the following graph
Long answer:
The knot sequence in Example 3 contains only the inner knots, therefore, you need to add the boundary knots. Since you want to evaluate the spline on the interval [0,1] the full knot sequence needs to cover the points 0 and 1. The simplest is to add 0 to the beginning and 1 to the end of the sequence and replicate them as necessary according to the desired degree of the spline. A cubic (third-degree) spline would require four boundary knots (i.e. four zeros and four ones) and a quadratic spline would require three boundary knots (three zeros and three ones).
There is a problem, however. A cubic spline would require 9 parameters, while Example 3 only gives you 7. Hence, you cannot construct a cubic spline from this. With the seven parameters given, you could construct a quadratic spline, but, there, the problem is that for quadratic splines each point can only appear at most three times in the inner-knot sequence. And 0.4 appears four times (which would suggest a cubic spline). Hence, all you can do is to remove one of the 0.4 knots and construct a second-degree spline as in the short answer above.
Now I will explain what you did wrong. In the first example you obtained the knot sequence from an existing spline using knots = s1.get_knots(), which gave you knots=[0,0.02,0.04,...,0.98,1]. This sequence contains the boundary knots 0 and 1 (although only once). Hence, to construct a cubic spline, you replicated each of these three times to obtain forscipyknots = [0,0,0,0,0.02,0.04,...,0.98,1,1,1,1]. So far so good.
In Example 3, however, the knot sequence does not contain the boundary points. As you did the same as before, you ended up replicating the 0.4 and 0.7 knots three times resulting in forscipyknots = [0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.7,0.7,0.7,0.7]. You cannot construct a spline on this sequence, whatever comes out of this is not a spline. What you needed instead was forscipyknots = [0.0,0.0,0.0,0.0,0.4,0.4,0.4,0.4,0.7,1.0,1.0,1.0,1.0] (which would not have worked because you do not have enough coefficients; but you could try this with your own, for instance, c = [1,2,-5,5,2,-3,-1,2,1]). To do this, you needed to add 0 to the beginning and 1 to the end of the array and only then use your newknots function.
Just as an example, a cubic spline could look like this
tt = [0.0,0.0,0.0,0.0,0.4,0.4,0.4,0.4,0.7,1.0,1.0,1.0,1.0]
c = [1,2,-5,5,2,-3,-1,2,1]
s2 = BSpline(tt,c,3)
I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am using the following sigmoid function:
t = 1 ./ (1 + exp(-z));
where
z = x*theta
And I am using the following cost function to calculate cost, to determine when to stop training.
function cost = computeCost(x, y, theta)
htheta = sigmoid(x*theta);
cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));
end
I am getting the cost at each step to be NaN as the values of htheta are either 1 or zero in most cases. What should I do to determine the cost value at each iteration?
This is the gradient descent code for logistic regression:
function [theta,cost_history] = batchGD(x,y,theta,alpha)
cost_history = zeros(1000,1);
for iter=1:1000
htheta = sigmoid(x*theta);
new_theta = zeros(size(theta,1),1);
for feature=1:size(theta,1)
new_theta(feature) = theta(feature) - alpha * sum((htheta - y) .*x(:,feature))
end
theta = new_theta;
cost_history(iter) = computeCost(x,y,theta);
end
end
There are two possible reasons why this may be happening to you.
The data is not normalized
This is because when you apply the sigmoid / logit function to your hypothesis, the output probabilities are almost all approximately 0s or all 1s and with your cost function, log(1 - 1) or log(0) will produce -Inf. The accumulation of all of these individual terms in your cost function will eventually lead to NaN.
Specifically, if y = 0 for a training example and if the output of your hypothesis is log(x) where x is a very small number which is close to 0, examining the first part of the cost function would give us 0*log(x) and will in fact produce NaN. Similarly, if y = 1 for a training example and if the output of your hypothesis is also log(x) where x is a very small number, this again would give us 0*log(x) and will produce NaN. Simply put, the output of your hypothesis is either very close to 0 or very close to 1.
This is most likely due to the fact that the dynamic range of each feature is widely different and so a part of your hypothesis, specifically the weighted sum of x*theta for each training example you have will give you either very large negative or positive values, and if you apply the sigmoid function to these values, you'll get very close to 0 or 1.
One way to combat this is to normalize the data in your matrix before performing training using gradient descent. A typical approach is to normalize with zero-mean and unit variance. Given an input feature x_k where k = 1, 2, ... n where you have n features, the new normalized feature x_k^{new} can be found by:
m_k is the mean of the feature k and s_k is the standard deviation of the feature k. This is also known as standardizing data. You can read up on more details about this on another answer I gave here: How does this code for standardizing data work?
Because you are using the linear algebra approach to gradient descent, I'm assuming you have prepended your data matrix with a column of all ones. Knowing this, we can normalize your data like so:
mX = mean(x,1);
mX(1) = 0;
sX = std(x,[],1);
sX(1) = 1;
xnew = bsxfun(#rdivide, bsxfun(#minus, x, mX), sX);
The mean and standard deviations of each feature are stored in mX and sX respectively. You can learn how this code works by reading the post I linked to you above. I won't repeat that stuff here because that isn't the scope of this post. To ensure proper normalization, I've made the mean and standard deviation of the first column to be 0 and 1 respectively. xnew contains the new normalized data matrix. Use xnew with your gradient descent algorithm instead. Now once you find the parameters, to perform any predictions you must normalize any new test instances with the mean and standard deviation from the training set. Because the parameters learned are with respect to the statistics of the training set, you must also apply the same transformations to any test data you want to submit to the prediction model.
Assuming you have new data points stored in a matrix called xx, you would do normalize then perform the predictions:
xxnew = bsxfun(#rdivide, bsxfun(#minus, xx, mX), sX);
Now that you have this, you can perform your predictions:
pred = sigmoid(xxnew*theta) >= 0.5;
You can change the threshold of 0.5 to be whatever you believe is best that determines whether examples belong in the positive or negative class.
The learning rate is too large
As you mentioned in the comments, once you normalize the data the costs appear to be finite but then suddenly go to NaN after a few iterations. Normalization can only get you so far. If your learning rate or alpha is too large, each iteration will overshoot in the direction towards the minimum and would thus make the cost at each iteration oscillate or even diverge which is what is appearing to be happening. In your case, the cost is diverging or increasing at each iteration to the point where it is so large that it can't be represented using floating point precision.
As such, one other option is to decrease your learning rate alpha until you see that the cost function is decreasing at each iteration. A popular method to determine what the best learning rate would be is to perform gradient descent on a range of logarithmically spaced values of alpha and seeing what the final cost function value is and choosing the learning rate that resulted in the smallest cost.
Using the two facts above together should allow gradient descent to converge quite nicely, assuming that the cost function is convex. In this case for logistic regression, it most certainly is.
Let's assume you have an observation where:
the true value is y_i = 1
your model is quite extreme and says that P(y_i = 1) = 1
Then your cost function will get a value of NaN because you're adding 0 * log(0), which is undefined. Hence:
Your formula for the cost function has a problem (there is a subtle 0, infinity issue)!
As #rayryeng pointed out, 0 * log(0) produces a NaN because 0 * Inf isn't kosher. This is actually a huge problem: if your algorithm believes it can predict a value perfectly, it incorrectly assigns a cost of NaN.
Instead of:
cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));
You can avoid multiplying 0 by infinity by instead writing your cost function in Matlab as:
y_logical = y == 1;
cost = sum(-log(htheta(y_logical))) + sum( - log(1 - htheta(~y_logical)));
The idea is if y_i is 1, we add -log(htheta_i) to the cost, but if y_i is 0, we add -log(1 - htheta_i) to the cost. This is mathematically equivalent to -y_i * log(htheta_i) - (1 - y_i) * log(1- htheta_i) but without running into numerical problems that essentially stem from htheta_i being equal to 0 or 1 within the limits of double precision floating point.
It happened to me because an indetermination of the type:
0*log(0)
This can happen when one of the predicted values Y equals either 0 or 1.
In my case the solution was to add an if statement to the python code as follows:
y * np.log (Y) + (1-y) * np.log (1-Y) if ( Y != 1 and Y != 0 ) else 0
This way, when the actual value (y) and the predicted one (Y) are equal, no cost needs to be computed, which is the expected behavior.
(Notice that when a given Y is converging to 0 the left addend is canceled (because of y=0) and the right addend tends toward 0. The same happens when Y converges to 1, but with the opposite addend.)
(There is also a very rare scenario, which you probably won't need to worry about, where y=0 and Y=1 or viceversa, but if your dataset is standarized and the weights are properly initialized it won't be an issue.)
I am trying to understand the normalization and "unnormalization" steps in the direct least squares ellipse fitting algorithm developed by Fitzgibbon, Pilu and Fisher (improved by Halir and Flusser).
EDITED: More details about theory added. Is eigenvalue problem where confusion stems from?
Short theory:
The ellipse is represented by an implicit second order polynomial (general conic equation):
where:
To constrain this general conic to an ellipse, the coefficients must satisfy the quadratic constraint:
which is equivalent to:
where C is a matrix of zeros except:
The design matrix D is composed of all data points x sub i.
The minimization of the distance between a conic and the data points can be expressed by a generalized eigenvalue problem (some theory has been omitted):
Denoting:
We now have the system:
If we solve this system, the eigenvector corresponding to the single positive eigenvalue is the correct answer.
The code:
The code snippets here are directly from the MATLAB code provided by the authors:
http://research.microsoft.com/en-us/um/people/awf/ellipse/fitellipse.html
The data input is a series of (x,y) points. The points are normalized by subtracting the mean and dividing by the standard deviation (in this case, computed as half the range). I'm assuming this normalization allows for a better fit of the data.
% normalize data
% X and Y are the vectors of data points, not normalized
mx = mean(X);
my = mean(Y);
sx = (max(X)-min(X))/2;
sy = (max(Y)-min(Y))/2;
x = (X-mx)/sx;
y = (Y-my)/sy;
% Build design matrix
D = [ x.*x x.*y y.*y x y ones(size(x)) ];
% Build scatter matrix
S = D'*D; %'
% Build 6x6 constraint matrix
C(6,6) = 0; C(1,3) = -2; C(2,2) = 1; C(3,1) = -2;
[gevec, geval] = eig(S,C);
% Find the negative eigenvalue
I = find(real(diag(geval)) < 1e-8 & ~isinf(diag(geval)));
% Extract eigenvector corresponding to negative eigenvalue
A = real(gevec(:,I));
After this, the normalization is reversed on the coefficients:
par = [
A(1)*sy*sy, ...
A(2)*sx*sy, ...
A(3)*sx*sx, ...
-2*A(1)*sy*sy*mx - A(2)*sx*sy*my + A(4)*sx*sy*sy, ...
-A(2)*sx*sy*mx - 2*A(3)*sx*sx*my + A(5)*sx*sx*sy, ...
A(1)*sy*sy*mx*mx + A(2)*sx*sy*mx*my + A(3)*sx*sx*my*my ...
- A(4)*sx*sy*sy*mx - A(5)*sx*sx*sy*my ...
+ A(6)*sx*sx*sy*sy ...
]';
At this point, I'm not sure what happened. Why is the unnormalization of the last three coefficients of A (d, e, f) dependent on the first three coefficients? How do you mathematically show where these unnormalization equations come from?
The 2 and 1 coefficients in the unnormalization lead me to believe the constraint matrix must be involved somehow.
Please let me know if more detail is needed on the method...it seems I'm missing how the normalization has propagated through the matrices and eigenvalue problem.
Any help is appreciated. Thanks!
At first, let me formalize the problem in a homogeneous space (as used in Richard Hartley and Andrew Zisserman's book Multiple View Geometry):
Assume that,
P=[X,Y,1]'
is our point in the unnormalized space, and
p=lambda*[x,y,1]'
is our point in the normalized space, where lambda is an unimportant free scale (in homogeneous space [x,y,1]=[10*x,10*y,10] and so on).
Now it is clear that we can write
x = (X-mx)/sx;
y = (Y-my)/sy;
as a simple matrix equation like:
p=H*P; %(equation (1))
where
H=[1/sx, 0, -mx/sx;
0, 1/sy, -my/sy;
0, 0, 1];
Also we know that an ellipse with the equation
A(1)*x^2 + A(2)*xy + A(3)*y^2 + A(4)*x + A(5)*y + A(6) = 0 %(first representation)
can be written in matrix form as:
p'*C*p=0 %you can easily verify this by matrix multiplication
where
C=[A(1), A(2)/2, A(4)/2;
A(2)/2, A(3), A(5)/2;
A(4)/2, A(5)/2, A(6)]; %second representation
and
p=[x,y,1]
and it is clear that these two representations of an ellipse are exactly the same and equivalent.
Also we know that the vector A=[A(1),A(2),A(3),A(4),A(5),A(6)] is a type-1 representation of the ellipse in the normalized space.
So we can write:
p'*C*p=0
where p is the normalized point and C is as defined previously.
Now we can use the "equation (1): p=HP" to derive some good result:
(H*P)'*C*(H*P)=0
=====>
P'*H'*C*H*P=0
=====>
P'*(H'*C*H)*P=0
=====>
P'*(C1)*P=0 %(equation (2))
We see that the equation (2) is an equation of an ellipse in the unnormalized space where C1 is the type-2 representation of ellipse and we know that:
C1=H'*C*H
Ans also, because the equation (2) is a zero equation we can multiply it by any non-zero number. So we multiply it by sx^2*sy^2 and we can write:
C1=sx^2*sy^2*H'*C*H
And finally we get the result
C1=[ A(1)*sy^2, (A(2)*sx*sy)/2, (A(4)*sx*sy^2)/2 - A(1)*mx*sy^2 - (A(2)*my*sx*sy)/2;
(A(2)*sx*sy)/2, A(3)*sx^2, (A(5)*sx^2*sy)/2 - A(3)*my*sx^2 - (A(2)*mx*sx*sy)/2;
-(- (A(4)*sx^2*sy^2)/2 + (A(2)*my*sx^2*sy)/2 + A(1)*mx*sx*sy^2)/sx, -(- (A(5)*sx^2*sy^2)/2 + A(3)*my*sx^2*sy + (A(2)*mx*sx*sy^2)/2)/sy, (mx*(- (A(4)*sx^2*sy^2)/2 + (A(2)*my*sx^2*sy)/2 + A(1)*mx*sx*sy^2))/sx + (my*(- (A(5)*sx^2*sy^2)/2 + A(3)*my*sx^2*sy + (A(2)*mx*sx*sy^2)/2))/sy + A(6)*sx^2*sy^2 - (A(4)*mx*sx*sy^2)/2 - (A(5)*my*sx^2*sy)/2]
which can be transformed into the type-2 ellipse and get the exact result we were looking for:
[ A(1)*sy^2, A(2)*sx*sy, A(3)*sx^2, A(4)*sx*sy^2 - 2*A(1)*mx*sy^2 - A(2)*my*sx*sy, A(5)*sx^2*sy - 2*A(3)*my*sx^2 - A(2)*mx*sx*sy, A(2)*mx*my*sx*sy + A(1)*mx*my*sy^2 + A(3)*my^2*sx^2 + A(6)*sx^2*sy^2 - A(4)*mx*sx*sy^2 - A(5)*my*sx^2*sy]
If you are curious how I managed to caculate these time-consuming equations I can give you the matlab code to do it for you as follows:
syms sx sy mx my
syms a b c d e f
C=[a, b/2, d/2;
b/2, c, e/2;
d/2, e/2, f];
H=[1/sx, 0, -mx/sx;
0, 1/sy, -my/sy;
0, 0, 1];
C1=sx^2*sy^2*H.'*C*H
a=[Cp(1,1), 2*Cp(1,2), Cp(2,2), 2*Cp(1,3), 2*Cp(2,3), Cp(3,3)]
I'd like to generate the frequency spectrum of seven concatenated cosine functions.
I am unsure whether my code is correct; in particular, whether N = time*freq*7 is correct, or whether it should be N = time*freq (without the times seven).
My code is as follow:
sn = [1, 2, 3, 4, 5, 6, 7, 8];
time = 1;
freq = 22050;
N = time*freq*7;
dt = 1/freq;
t = 0 : dt : time - dt;
y = #(sn, phasePosNeg) cos(2*pi*(1200-100*sn) * t + phasePosNeg * sn*pi/10);
f = [y(sn(1), 1), y(sn(2), -1), y(sn(3), 1), y(sn(4), -1), y(sn(5), 1), y(sn(6), -1), y(sn(7), 1)];
F = abs(fftshift(fft(f)))/N;
df = freq/N;
faxis = -freq/2 : df : (freq/2-1/freq);
plot(faxis, F);
grid on;
axis([-1500, 1500, 0, 0.6]);
title('Frequency Spectrum Of Concatenated Cosine Functions');
xlabel('Frequency (Hz)');
ylabel('Magnitude');
I suppose the essense of my question is:
Should the height of the spikes equal 1/7 of 0.5, or simply 0.5? (All cosine functions have an amplitude of 1.)
Thank you.
Let me correct/help you on a few things:
1) the fourier transform is typically displayed in dB for its magnitute. 20*log base10(FFT coeff)
2) there is no need to divide your FFT amplitudes by any value of N.
F = abs(fftshift(fft(f)))/N; %get rid of the N or N =1
3) if N is the number of points in your FFT N = size(t); because you've taken that many samples of the sin/cos functions
4) When plotting the function remember that the FFT spans from -Pi to + Pi and you need to remap it to the frequency spectrum using your sampling frequency
5) becasue of the large phase discontinuties between these functions, dont expect the forrier transform the be a bunch of large narrow peaks. (otherwise Phase Modulation would be the modulationscheme of choice... zero bandwidth)
This is clearly homework, so I'm just going to give some direction: In the concatenated case, think of it as though you're adding six waveforms, which are each padded by zeros (6x the length of the waveform), and then offset these so they don't overlap and then added together to form the concatenated waveform. In the case where you're adding the FFTs of the individual waveforms, also remember that you're assuming that they are periodic. So you basically need to handle the zero padding so that you're comparing apples to apples. One check, of course, is to just use one waveform throughout and make sure it works in this case (and this should return exactly the same result since the FFT assumes that waveform is periodic -- that is, infinitely concatenated to itself).
Thinking of this in terms of Parseval's Theorem will probably be helpful in figuring out how to interpret and normalize things.
It is correct to use N=(7*time)*freq, since your actual time of the waveform is 7*time, regardless of how you constructed it.
Several of the comments talk about discontinuities, but it should be noted that these usually exist in the FFT anyway, since the FFT waveform is assumed to be periodic, and that usually means that there are effectively discontinuities at the boundaries even in the non-concatenated case.