Related
Suppose I have a vector t = [0 0.1 0.9 1 1.4], and a vector x = [1 3 5 2 3]. How can I compute the derivative of x with respect to time that has the same length as the original vectors?
I should not use any symbolic operations. The command diff(x)./diff(t) does not produce a vector of the same length. Should I first interpolate the x(t) function and then take its derivative?
Different approaches exist to calculate the derivative at the same points as your initial data:
Finite differences: Use a central difference scheme at your inner points and a forward/backward scheme at your first/last point
or
Curve fitting: Fit a curve through your points, calculate the derivative of this fitted function and sample them at the same points as the original data. Typical fitting functions are polynomials or spline functions.
Note that the curve fitting approach gives better results, but needs more tuning options and is slower (~100x).
Demonstration
As an example, I will calculate the derivative of a sine function:
t = 0:0.1:1;
y = sin(t);
Its exact derivative is well known:
dy_dt_exact = cos(t);
The derivative can approximately been calculated as:
Finite differences:
dy_dt_approx = zeros(size(y));
dy_dt_approx(1) = (y(2) - y(1))/(t(2) - t(1)); % forward difference
dy_dt_approx(end) = (y(end) - y(end-1))/(t(end) - t(end-1)); % backward difference
dy_dt_approx(2:end-1) = (y(3:end) - y(1:end-2))./(t(3:end) - t(1:end-2)); % central difference
or
Polynomial fitting:
p = polyfit(t,y,5); % fit fifth order polynomial
dp = polyder(p); % calculate derivative of polynomial
The results can be visualised as follows:
figure('Name', 'Derivative')
hold on
plot(t, dy_dt_exact, 'DisplayName', 'eyact');
plot(t, dy_dt_approx, 'DisplayName', 'finite difference');
plot(t, polyval(dp, t), 'DisplayName', 'polynomial');
legend show
figure('Name', 'Error')
hold on
plot(t, abs(dy_dt_approx - dy_dt_exact)/max(dy_dt_exact), 'DisplayName', 'finite difference');
plot(t, abs(polyval(dp, t) - dy_dt_exact)/max(dy_dt_exact), 'DisplayName', 'polynomial');
legend show
The first graph shows the derivatives itself and the second graph plots the relative errors made by both methods.
Discussion
One clearly sees that the curve fitting method gives better results than the finite differences, but it is ~100x slower. The curve fitting methods has a relative error of order 10^-5. Note that the finite differences approach becomes better when your data is sampled more densely or you use a higher order scheme. The disadvantage of the curve fitting approach is that one has to choose a good polynomial order. Spline functions may be better suited in general.
A 10x faster sampled dataset, i.e. t = 0:0.01:1;, results in the following graphs:
I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am using the following sigmoid function:
t = 1 ./ (1 + exp(-z));
where
z = x*theta
And I am using the following cost function to calculate cost, to determine when to stop training.
function cost = computeCost(x, y, theta)
htheta = sigmoid(x*theta);
cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));
end
I am getting the cost at each step to be NaN as the values of htheta are either 1 or zero in most cases. What should I do to determine the cost value at each iteration?
This is the gradient descent code for logistic regression:
function [theta,cost_history] = batchGD(x,y,theta,alpha)
cost_history = zeros(1000,1);
for iter=1:1000
htheta = sigmoid(x*theta);
new_theta = zeros(size(theta,1),1);
for feature=1:size(theta,1)
new_theta(feature) = theta(feature) - alpha * sum((htheta - y) .*x(:,feature))
end
theta = new_theta;
cost_history(iter) = computeCost(x,y,theta);
end
end
There are two possible reasons why this may be happening to you.
The data is not normalized
This is because when you apply the sigmoid / logit function to your hypothesis, the output probabilities are almost all approximately 0s or all 1s and with your cost function, log(1 - 1) or log(0) will produce -Inf. The accumulation of all of these individual terms in your cost function will eventually lead to NaN.
Specifically, if y = 0 for a training example and if the output of your hypothesis is log(x) where x is a very small number which is close to 0, examining the first part of the cost function would give us 0*log(x) and will in fact produce NaN. Similarly, if y = 1 for a training example and if the output of your hypothesis is also log(x) where x is a very small number, this again would give us 0*log(x) and will produce NaN. Simply put, the output of your hypothesis is either very close to 0 or very close to 1.
This is most likely due to the fact that the dynamic range of each feature is widely different and so a part of your hypothesis, specifically the weighted sum of x*theta for each training example you have will give you either very large negative or positive values, and if you apply the sigmoid function to these values, you'll get very close to 0 or 1.
One way to combat this is to normalize the data in your matrix before performing training using gradient descent. A typical approach is to normalize with zero-mean and unit variance. Given an input feature x_k where k = 1, 2, ... n where you have n features, the new normalized feature x_k^{new} can be found by:
m_k is the mean of the feature k and s_k is the standard deviation of the feature k. This is also known as standardizing data. You can read up on more details about this on another answer I gave here: How does this code for standardizing data work?
Because you are using the linear algebra approach to gradient descent, I'm assuming you have prepended your data matrix with a column of all ones. Knowing this, we can normalize your data like so:
mX = mean(x,1);
mX(1) = 0;
sX = std(x,[],1);
sX(1) = 1;
xnew = bsxfun(#rdivide, bsxfun(#minus, x, mX), sX);
The mean and standard deviations of each feature are stored in mX and sX respectively. You can learn how this code works by reading the post I linked to you above. I won't repeat that stuff here because that isn't the scope of this post. To ensure proper normalization, I've made the mean and standard deviation of the first column to be 0 and 1 respectively. xnew contains the new normalized data matrix. Use xnew with your gradient descent algorithm instead. Now once you find the parameters, to perform any predictions you must normalize any new test instances with the mean and standard deviation from the training set. Because the parameters learned are with respect to the statistics of the training set, you must also apply the same transformations to any test data you want to submit to the prediction model.
Assuming you have new data points stored in a matrix called xx, you would do normalize then perform the predictions:
xxnew = bsxfun(#rdivide, bsxfun(#minus, xx, mX), sX);
Now that you have this, you can perform your predictions:
pred = sigmoid(xxnew*theta) >= 0.5;
You can change the threshold of 0.5 to be whatever you believe is best that determines whether examples belong in the positive or negative class.
The learning rate is too large
As you mentioned in the comments, once you normalize the data the costs appear to be finite but then suddenly go to NaN after a few iterations. Normalization can only get you so far. If your learning rate or alpha is too large, each iteration will overshoot in the direction towards the minimum and would thus make the cost at each iteration oscillate or even diverge which is what is appearing to be happening. In your case, the cost is diverging or increasing at each iteration to the point where it is so large that it can't be represented using floating point precision.
As such, one other option is to decrease your learning rate alpha until you see that the cost function is decreasing at each iteration. A popular method to determine what the best learning rate would be is to perform gradient descent on a range of logarithmically spaced values of alpha and seeing what the final cost function value is and choosing the learning rate that resulted in the smallest cost.
Using the two facts above together should allow gradient descent to converge quite nicely, assuming that the cost function is convex. In this case for logistic regression, it most certainly is.
Let's assume you have an observation where:
the true value is y_i = 1
your model is quite extreme and says that P(y_i = 1) = 1
Then your cost function will get a value of NaN because you're adding 0 * log(0), which is undefined. Hence:
Your formula for the cost function has a problem (there is a subtle 0, infinity issue)!
As #rayryeng pointed out, 0 * log(0) produces a NaN because 0 * Inf isn't kosher. This is actually a huge problem: if your algorithm believes it can predict a value perfectly, it incorrectly assigns a cost of NaN.
Instead of:
cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));
You can avoid multiplying 0 by infinity by instead writing your cost function in Matlab as:
y_logical = y == 1;
cost = sum(-log(htheta(y_logical))) + sum( - log(1 - htheta(~y_logical)));
The idea is if y_i is 1, we add -log(htheta_i) to the cost, but if y_i is 0, we add -log(1 - htheta_i) to the cost. This is mathematically equivalent to -y_i * log(htheta_i) - (1 - y_i) * log(1- htheta_i) but without running into numerical problems that essentially stem from htheta_i being equal to 0 or 1 within the limits of double precision floating point.
It happened to me because an indetermination of the type:
0*log(0)
This can happen when one of the predicted values Y equals either 0 or 1.
In my case the solution was to add an if statement to the python code as follows:
y * np.log (Y) + (1-y) * np.log (1-Y) if ( Y != 1 and Y != 0 ) else 0
This way, when the actual value (y) and the predicted one (Y) are equal, no cost needs to be computed, which is the expected behavior.
(Notice that when a given Y is converging to 0 the left addend is canceled (because of y=0) and the right addend tends toward 0. The same happens when Y converges to 1, but with the opposite addend.)
(There is also a very rare scenario, which you probably won't need to worry about, where y=0 and Y=1 or viceversa, but if your dataset is standarized and the weights are properly initialized it won't be an issue.)
I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am using the following sigmoid function:
t = 1 ./ (1 + exp(-z));
where
z = x*theta
And I am using the following cost function to calculate cost, to determine when to stop training.
function cost = computeCost(x, y, theta)
htheta = sigmoid(x*theta);
cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));
end
I am getting the cost at each step to be NaN as the values of htheta are either 1 or zero in most cases. What should I do to determine the cost value at each iteration?
This is the gradient descent code for logistic regression:
function [theta,cost_history] = batchGD(x,y,theta,alpha)
cost_history = zeros(1000,1);
for iter=1:1000
htheta = sigmoid(x*theta);
new_theta = zeros(size(theta,1),1);
for feature=1:size(theta,1)
new_theta(feature) = theta(feature) - alpha * sum((htheta - y) .*x(:,feature))
end
theta = new_theta;
cost_history(iter) = computeCost(x,y,theta);
end
end
There are two possible reasons why this may be happening to you.
The data is not normalized
This is because when you apply the sigmoid / logit function to your hypothesis, the output probabilities are almost all approximately 0s or all 1s and with your cost function, log(1 - 1) or log(0) will produce -Inf. The accumulation of all of these individual terms in your cost function will eventually lead to NaN.
Specifically, if y = 0 for a training example and if the output of your hypothesis is log(x) where x is a very small number which is close to 0, examining the first part of the cost function would give us 0*log(x) and will in fact produce NaN. Similarly, if y = 1 for a training example and if the output of your hypothesis is also log(x) where x is a very small number, this again would give us 0*log(x) and will produce NaN. Simply put, the output of your hypothesis is either very close to 0 or very close to 1.
This is most likely due to the fact that the dynamic range of each feature is widely different and so a part of your hypothesis, specifically the weighted sum of x*theta for each training example you have will give you either very large negative or positive values, and if you apply the sigmoid function to these values, you'll get very close to 0 or 1.
One way to combat this is to normalize the data in your matrix before performing training using gradient descent. A typical approach is to normalize with zero-mean and unit variance. Given an input feature x_k where k = 1, 2, ... n where you have n features, the new normalized feature x_k^{new} can be found by:
m_k is the mean of the feature k and s_k is the standard deviation of the feature k. This is also known as standardizing data. You can read up on more details about this on another answer I gave here: How does this code for standardizing data work?
Because you are using the linear algebra approach to gradient descent, I'm assuming you have prepended your data matrix with a column of all ones. Knowing this, we can normalize your data like so:
mX = mean(x,1);
mX(1) = 0;
sX = std(x,[],1);
sX(1) = 1;
xnew = bsxfun(#rdivide, bsxfun(#minus, x, mX), sX);
The mean and standard deviations of each feature are stored in mX and sX respectively. You can learn how this code works by reading the post I linked to you above. I won't repeat that stuff here because that isn't the scope of this post. To ensure proper normalization, I've made the mean and standard deviation of the first column to be 0 and 1 respectively. xnew contains the new normalized data matrix. Use xnew with your gradient descent algorithm instead. Now once you find the parameters, to perform any predictions you must normalize any new test instances with the mean and standard deviation from the training set. Because the parameters learned are with respect to the statistics of the training set, you must also apply the same transformations to any test data you want to submit to the prediction model.
Assuming you have new data points stored in a matrix called xx, you would do normalize then perform the predictions:
xxnew = bsxfun(#rdivide, bsxfun(#minus, xx, mX), sX);
Now that you have this, you can perform your predictions:
pred = sigmoid(xxnew*theta) >= 0.5;
You can change the threshold of 0.5 to be whatever you believe is best that determines whether examples belong in the positive or negative class.
The learning rate is too large
As you mentioned in the comments, once you normalize the data the costs appear to be finite but then suddenly go to NaN after a few iterations. Normalization can only get you so far. If your learning rate or alpha is too large, each iteration will overshoot in the direction towards the minimum and would thus make the cost at each iteration oscillate or even diverge which is what is appearing to be happening. In your case, the cost is diverging or increasing at each iteration to the point where it is so large that it can't be represented using floating point precision.
As such, one other option is to decrease your learning rate alpha until you see that the cost function is decreasing at each iteration. A popular method to determine what the best learning rate would be is to perform gradient descent on a range of logarithmically spaced values of alpha and seeing what the final cost function value is and choosing the learning rate that resulted in the smallest cost.
Using the two facts above together should allow gradient descent to converge quite nicely, assuming that the cost function is convex. In this case for logistic regression, it most certainly is.
Let's assume you have an observation where:
the true value is y_i = 1
your model is quite extreme and says that P(y_i = 1) = 1
Then your cost function will get a value of NaN because you're adding 0 * log(0), which is undefined. Hence:
Your formula for the cost function has a problem (there is a subtle 0, infinity issue)!
As #rayryeng pointed out, 0 * log(0) produces a NaN because 0 * Inf isn't kosher. This is actually a huge problem: if your algorithm believes it can predict a value perfectly, it incorrectly assigns a cost of NaN.
Instead of:
cost = sum(-y .* log(htheta) - (1-y) .* log(1-htheta));
You can avoid multiplying 0 by infinity by instead writing your cost function in Matlab as:
y_logical = y == 1;
cost = sum(-log(htheta(y_logical))) + sum( - log(1 - htheta(~y_logical)));
The idea is if y_i is 1, we add -log(htheta_i) to the cost, but if y_i is 0, we add -log(1 - htheta_i) to the cost. This is mathematically equivalent to -y_i * log(htheta_i) - (1 - y_i) * log(1- htheta_i) but without running into numerical problems that essentially stem from htheta_i being equal to 0 or 1 within the limits of double precision floating point.
It happened to me because an indetermination of the type:
0*log(0)
This can happen when one of the predicted values Y equals either 0 or 1.
In my case the solution was to add an if statement to the python code as follows:
y * np.log (Y) + (1-y) * np.log (1-Y) if ( Y != 1 and Y != 0 ) else 0
This way, when the actual value (y) and the predicted one (Y) are equal, no cost needs to be computed, which is the expected behavior.
(Notice that when a given Y is converging to 0 the left addend is canceled (because of y=0) and the right addend tends toward 0. The same happens when Y converges to 1, but with the opposite addend.)
(There is also a very rare scenario, which you probably won't need to worry about, where y=0 and Y=1 or viceversa, but if your dataset is standarized and the weights are properly initialized it won't be an issue.)
Linearly Non-Separable Binary Classification Problem
First of all, this program isn' t working correctly for RBF ( gaussianKernel() ) and I want to fix it.
It is a non-linear SVM Demo to illustrate classifying 2 class with hard margin application.
Problem is about 2 dimensional radial random distrubuted data.
I used Quadratic Programming Solver to compute Lagrange multipliers (alphas)
xn = input .* (output*[1 1]); % xiyi
phi = gaussianKernel(xn, sigma2); % Radial Basis Function
k = phi * phi'; % Symmetric Kernel Matrix For QP Solver
gamma = 1; % Adjusting the upper bound of alphas
f = -ones(2 * len, 1); % Coefficient of sum of alphas
Aeq = output'; % yi
beq = 0; % Sum(ai*yi) = 0
A = zeros(1, 2* len); % A * alpha <= b; There isn't like this term
b = 0; % There isn't like this term
lb = zeros(2 * len, 1); % Lower bound of alphas
ub = gamma * ones(2 * len, 1); % Upper bound of alphas
alphas = quadprog(k, f, A, b, Aeq, beq, lb, ub);
To solve this non linear classification problem, I wrote some kernel functions such as gaussian (RBF), homogenous and non-homogenous polynomial kernel functions.
For RBF, I implemented the function in the image below:
Using Tylor Series Expansion, it yields:
And, I seperated the Gaussian Kernel like this:
K(x, x') = phi(x)' * phi(x')
The implementation of this thought is:
function phi = gaussianKernel(x, Sigma2)
gamma = 1 / (2 * Sigma2);
featDim = 10; % Length of Tylor Series; Gaussian Kernel Converge 0 so It doesn't have to Be Inf Dimension
phi = []; % Kernel Output, The Dimension will be (#Sample) x (featDim*2)
for k = 0 : (featDim - 1)
% Gaussian Kernel Trick Using Tylor Series Expansion
phi = [phi, exp( -gamma .* (x(:, 1)).^2) * sqrt(gamma^2 * 2^k / factorial(k)) .* x(:, 1).^k, ...
exp( -gamma .* (x(:, 2)).^2) * sqrt(gamma^2 * 2^k / factorial(k)) .* x(:, 2).^k];
end
end
*** I think my RBF implementation is wrong, but I don' t know how to fix it. Please help me here.
Here is what I got as output:
where,
1) The first image : Samples of Classes
2) The second image : Marking The Support Vectors of Classes
3) The third image : Adding Random Test Data
4) The fourth image : Classification
Also, I implemented Homogenous Polinomial Kernel " K(x, x') = ( )^2 ", code is:
function phi = quadraticKernel(x)
% 2-Order Homogenous Polynomial Kernel
phi = [x(:, 1).^2, sqrt(2).*(x(:, 1).*x(:, 2)), x(:, 2).^2];
end
And I got surprisingly nice output:
To sum up, the program is working correctly with using homogenous polynomial kernel but when I use RBF, it isn' t working correctly, there is something wrong with RBF implementation.
If you know about RBF (Gaussian Kernel) please let me know how I can make it right..
Edit: If you have same issue, use RBF directly that defined above and dont separe it by phi.
Why do you want to compute phi for Gaussian Kernel? Phi will be infinite dimensional vector and you are bounding the terms in your taylor series to 10 when we don't even know whether 10 is enough to approximate the kernel values or not! Usually, the kernel is computed directly instead of getting phi (and the computing k). For example [1].
Does this mean we should never compute phi for Gaussian? Not really, no, but we have to be slightly smarter about it. There have been recent works [2,3] which show how to compute phi for Gaussian so that you can compute approximate kernel matrices while having just finite dimensional phi's. Here [4] I give the very simple code to generate the approximate kernel using the trick from the paper. However, in my experiments I needed to generate anywhere from 100 to 10000 dimensional phi's to be able to get a good approximation of the kernel (depending upon on the number of features the original input had as well as the rate at which the eigenvalues of the original matrix tapers off).
For the moment, just use code similar to [1] to generate the Gaussian kernel and then observe the result of SVM. Also, play around with the gamma parameter, a bad gamma parameter can result in really bad classification.
[1] https://github.com/ssamot/causality/blob/master/matlab-code/Code/mfunc/indep/HSIC/rbf_dot.m
[2] http://www.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf
[3] http://www.eecs.berkeley.edu/~brecht/papers/08.rah.rec.nips.pdf
[4] https://github.com/aruniyer/misc/blob/master/rks.m
Since Gaussian kernel is often referred as mapping to infinity dimensions, I always have faith in its capacity. The problem here maybe due to a bad parameter while keeping in mind grid search is always needed for SVM training. Thus I propose you could take a look at here where you could find some tricks for parameter tuning. Exponentially increasing sequence is usually used as candidates.
By using normrnd, I would like to create a normal distribution function with mean and sigma values expressed as vectors of size 1x45 varying from 1:45 and plot this simulated PDF with ideal values.
Whenever I create a normrnd like the one expressed below,
Gaussian = normrnd([1 45],[1 45],[1 500],length(c_t));
I am obtaining the following error,
Size information is inconsistent.
The reason for creating this PDF is to compute Chemical kinetics of a tracer with variable gaussian noise model. Basically i have an Ideal characteristics of a Tracer now i would like to add gaussian noise and understand how the chemical kinetics of a tracer vary with changing noise.
Basically there are different computational models for understanding chemical kinetics of tracer, one of which is Three compartmental model ,others are viz shape analysis,constrained shape analysis model.
I currently have ideal curve for all respective models, now i would like to add noise to these models and understand how each particular model behaves with varying noise
This is why i would like to create a variable noise model with normrnd add this model to ideal characteristics and compute Noise(Sigma) Vs Error -This analysis will give me an approximate estimation how different models behave with varying noise and which model is suitable for estimating chemical kinetics of tracer.
function [c_t,c_t_noise] =Noise_ConstrainedK2(t,a1,a2,a3,b1,b2,b3,td,tmax,k1,k2,k3)
K_1 = (k1*k2)/(k2+k3);
K_2 = (k1*k3)/(k2+k3);
%DV_free= k1/(k2+k3);
c_t = zeros(size(t));
ind = (t > td) & (t < tmax);
c_t(ind)= conv(((t(ind) - td) ./ (tmax - td) * (a1 + a2 + a3)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
ind = (t >= tmax);
c_t(ind)=conv((a1 * exp(-b1 * (t(ind) - tmax))+ a2 * exp(-b2 * (t(ind) - tmax))) + a3 * exp(-b3 * (t(ind) - tmax)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
meanAndVar = (rand(45,2)-0.5)*2;
numPoints = 500;
randSamples = zeros(1,numPoints);
for ii = 1:numPoints
idx = mod(ii,size(meanAndVar,1))+1;
randSamples(ii) = normrnd(meanAndVar(idx,1),meanAndVar(idx,2));
c_t_noise = c_t + randSamples(ii);
end
scatter(1:numPoints,randSamples)
dg = [0 0.5 0];
plot(t,c_t,'r');
hold on;
plot(t,c_t_noise,'Color',dg);
hold off;
axis([0 50 0 1900]);
xlabel('Time[mins]');
ylabel('concentration [Mbq]');
title('My signal');
%plot(t,c_tnp);
end
The output characteristics from the above function are as follows,Here i could not visualize any noise
The only remotely close thing to what you want to be done can be done as follows, but will involve looping because you can not request 500 data points from only 45 different means and variances, without the assumption that multiple sets can be revisited.
This is my interpretation of what you want, though I am still not entirely sure.
Random Gaussian Function Selection
meanAndVar = rand(45,2);
numPoints = 500;
randSamples = zeros(1,numPoints);
for ii = 1:numPoints
randMeanVarIdx = randi([1,size(meanAndVar,1)]);
randSamples(ii) = normrnd(meanAndVar(randMeanVarIdx,1),meanAndVar(randMeanVarIdx,2));
end
scatter(1:numPoints,randSamples)
The above code generates a random 2-D matrix of mean and variance (1st col = mean, 2nd col = variance). We then preallocate some space.
Inside the loop we chose a random set of mean and variance to use (uniformly) and then take that mean and variance, plug it into a random gaussian value function, and store it.
the matrix randSamples will contain a list of random values generated by a random set of gaussian functions chosen in a randomly uniform manner.
Sequential Function Selection
If you do not want to randomly select which function to use, and just want to go sequentially you loop using modulus to get the index of which set of values to use.
meanAndVar = (rand(45,2)-0.5)*2; % zero shift and make bounds [-1,1]
numPoints = 500;
randSamples = zeros(1,numPoints);
for ii = 1:numPoints
idx = mod(ii,size(meanAndVar,1))+1;
randSamples(ii) = normrnd(meanAndVar(idx,1),meanAndVar(idx,2));
end
scatter(1:numPoints,randSamples)
The problem with this statement
Gaussian = normrnd([1 45],[1 45],[1 500],length(c_t));
is that you supply two mu values and two sigma values, and ask for a matrix of size [1 500] x length(c_t). You need to pass the size in a uniform way, so either
Gaussian = normrnd(mu, sigma,[500 length(c_t)]);
or
Gaussian = normrnd(mu, sigma, 500, length(c_t));
Then you should make sure that the size of the mu/sigma vectors match the size of the matrix you ask for. So if you want a 500 x length(c_t) matrix as output you need to pass 500 x length(c_t) (mu,sigma) pairs. If you only want to vary one of mu or sigma you can pass a single value for the other parameter
To get N values from a normal distribution with fixed mean and steadily increasing sigma you can do
noise = #(mu, s0, s1, n) normrnd(mu, s0:(s1-s0)/(n-1):s1, 1,n)
where s0 is the lowest sigma value and s1 is the largest sigma value. To get 10 values drawn from distributions with mu=0 and sigma increasing from 1 to 5 you can do
noise(0,1,5,10)
If you want to introduce some randomness in the increase of sigma you can do
noise_rand = #(mu, s0, s1, n) normrnd(mu, (s0:(s1-s0)/(n-1):s1) .* rand(1,n), 1,n)