Integration via trapezoidal sums in MATLAB - matlab

I need help finding an integral of a function using trapezoidal sums.
The program should take successive trapezoidal sums with n = 1, 2, 3, ...
subintervals until there are two neighouring values of n that differ by less than a given tolerance. I want at least one FOR loop within a WHILE loop and I don't want to use the trapz function. The program takes four inputs:
f: A function handle for a function of x.
a: A real number.
b: A real number larger than a.
tolerance: A real number that is positive and very small
The problem I have is trying to implement the formula for trapezoidal sums which is
Δx/2[y0 + 2y1 + 2y2 + … + 2yn-1 + yn]
Here is my code, and the area I'm stuck in is the "sum" part within the FOR loop. I'm trying to sum up 2y2 + 2y3....2yn-1 since I already accounted for 2y1. I get an answer, but it isn't as accurate as it should be. For example, I get 6.071717974723753 instead of 6.101605982576467.
Thanks for any help!
function t=trapintegral(f,a,b,tol)
format compact; format long;
syms x;
oldtrap = ((b-a)/2)*(f(a)+f(b));
n = 2;
h = (b-a)/n;
newtrap = (h/2)*(f(a)+(2*f(a+h))+f(b));
while (abs(newtrap-oldtrap)>=tol)
oldtrap = newtrap;
for i=[3:n]
dx = (b-a)/n;
trapezoidsum = (dx/2)*(f(x) + (2*sum(f(a+(3:n-1))))+f(b));
newtrap = trapezoidsum;
end
end
t = newtrap;
end

The reason why this code isn't working is because there are two slight errors in your summation for the trapezoidal rule. What I am precisely referring to is this statement:
trapezoidsum = (dx/2)*(f(x) + (2*sum(f(a+(3:n-1))))+f(b));
Recall the equation for the trapezoidal integration rule:
Source: Wikipedia
For the first error, f(x) should be f(a) as you are including the starting point, and shouldn't be left as symbolic. In fact, you should simply get rid of the syms x statement as it is not useful in your script. a corresponds to x1 by consulting the above equation.
The next error is the second term. You actually need to multiply your index values (3:n-1) by dx. Also, this should actually go from (1:n-1) and I'll explain later. The equation above goes from 2 to N, but for our purposes, we are going to go from 1 to N-1 as you have your code set up like that.
Remember, in the trapezoidal rule, you are subdividing the finite interval into n pieces. The ith piece is defined as:
x_i = a + dx*i; ,
where i goes from 1 up to N-1. Note that this starts at 1 and not 3. The reason why is because the first piece is already taken into account by f(a), and we only count up to N-1 as piece N is accounted by f(b). For the equation, this goes from 2 to N and by modifying the code this way, this is precisely what we are doing in the end.
Therefore, your statement actually needs to be:
trapezoidsum = (dx/2)*(f(a) + (2*sum(f(a+dx*(1:n-1))))+f(b));
Try this and let me know if you get the right answer. FWIW, MATLAB already implements trapezoidal integration by doing trapz as #ADonda already pointed out. However, you need to properly structure what your x and y values are before you set this up. In other words, you would need to set up your dx before hand, then calculate your x points using the x_i equation that I specified above, then use these to generate your y values. You then use trapz to calculate the area. In other words:
dx = (b-a) / n;
x = a + dx*(0:n);
y = f(x);
trapezoidsum = trapz(x,y);
You can use the above code as a reference to see if you are implementing the trapezoidal rule correctly. Your implementation and using the above code should generate the same results. All you have to do is change the value of n, then run this code to generate the approximation of the area for different subdivisions underneath your curve.
Edit - August 17th, 2014
I figured out why your code isn't working. Here are the reasons why:
The for loop is unnecessary. Take a look at the for loop iteration. You have a loop going from i = [3:n] yet you don't reference the i variable at all in your loop. As such, you don't need this at all.
You are not computing successive intervals properly. What you need to do is when you compute the trapezoidal sum for the nth subinterval, you then increment this value of n, then compute the trapezoidal rule again. This value is not being incremented properly in your while loop, which is why your area is never improving.
You need to save the previous area inside the while loop, then when you compute the next area, that's when you determine whether or not the difference between the areas is less than the tolerance. We can also get rid of that code at the beginning that tries and compute the area for n = 2. That's not needed, as we can place this inside your while loop. As such, this is what your code should look like:
function t=trapintegral(f,a,b,tol)
format long; %// Got rid of format compact. Useless
%// n starts at 2 - Also removed syms x - Useless statement
n = 2;
newtrap = ((b-a)/2)*(f(a) + f(b)); %// Initialize
oldtrap = 0; %// Initialize to 0
while (abs(newtrap-oldtrap)>=tol)
oldtrap = newtrap; %//Save the old area from the previous iteration
dx = (b-a)/n; %//Compute width
%//Determine sum
trapezoidsum = (dx/2)*(f(a) + (2*sum(f(a+dx*(1:n-1))))+f(b));
newtrap = trapezoidsum; % //This is the new sum
n = n + 1; % //Go to the next value of n
end
t = newtrap;
end
By running your code, this is what I get:
trapezoidsum = trapintegral(#(x) (x+x.^2).^(1/3),1,4,0.00001)
trapezoidsum =
6.111776299189033
Caveat
Look at the way I defined your function. You must use element-by-element operations as the sum command inside the loop will be vectorized. Take a look at the ^ operations specifically. You need to prepend a dot to the operations. Once you do this, I get the right answer.
Edit #2 - August 18th, 2014
You said you want at least one for loop. This is highly inefficient, and whoever specified having one for loop in the code really doesn't know how MATLAB works. Nevertheless, you can use the for loop to accumulate the sum term. As such:
function t=trapintegral(f,a,b,tol)
format long; %// Got rid of format compact. Useless
%// n starts at 3 - Also removed syms x - Useless statement
n = 3;
%// Compute for n = 2 first, then proceed if we don't get a better
%// difference tolerance
newtrap = ((b-a)/2)*(f(a) + f(b)); %// Initialize
oldtrap = 0; %// Initialize to 0
while (abs(newtrap-oldtrap)>=tol)
oldtrap = newtrap; %//Save the old area from the previous iteration
dx = (b-a)/n; %//Compute width
%//Determine sum
%// Initialize
trapezoidsum = (dx/2)*(f(a) + f(b));
%// Accumulate sum terms
%// Note that we multiply each term by (dx/2), but because of the
%// factor of 2 for each of these terms, these cancel and we thus have dx
for n2 = 1 : n-1
trapezoidsum = trapezoidsum + dx*f(a + dx*n2);
end
newtrap = trapezoidsum; % //This is the new sum
n = n + 1; % //Go to the next value of n
end
t = newtrap;
end
Good luck!

Related

How to setup equation that involves a sum from x=1 to infinity and loops?

I am getting confused on how to properly set up this equation. To find a value of V(i,j). The end result would be plotting V over time. I understand that there needs to be loops to allow this equation to work, however I am lost when it comes to setting it up. Basically I am trying to take the sum from n=1 to infinity of (1-(-1)^n)/(n^4 *pi^4)*sin((n*pi*c*j)/L)*sin((n*pi*i)/L)
I originally thought that I should make it a while loop to increment n by 1 until I reach say 10 or so just to get an idea of what the output would look like. All of the variables were unknown and values were added again to see what the plot would look like.
I have down another code where the equation is just dependent on i and j. However with this n term, I am thrown off. Any advice would be great as to setting up the equation. Thank you.
L=10;
x=linspace(0,L,30);
t1= 50;
X=30;
p=1
c=t1/1000;
V=zeros(X,t1);
V(1,:)=0;
V(30,:)=0;
R=((4*p*L^3)/c);
n=1;
t=1:50;
while n < 10
for i=1:31
for j=1:50
V(i,j)=R*sum((1-(-1)^n)/(n^4 *pi^4)*sin((n*pi*c*j)/L)*sin((n*pi*i)/L));
end
end
n=n+1;
end
figure(1)
plot(V(i,j),t)
Various ways of doing so:
1) Computing the sum up to one Nmax in one shot:
Nmax = 30;
Vijn = #(i,j,n) R*((1-(-1)^n)/(n^4 *pi^4)*sin((n*pi*c*j)/L)*sin((n*pi*i)/L));
i = 1:31;
j = 1:50;
n = 1:Nmax;
[I,J,N] = ndgrid(i,j,n);
V = arrayfun(Vijn,I,J,N);
Vc = cumsum(V,3);
% now Vc(:,:,k) is sum_n=1^{k+1} V(i,j,n)
figure(1);clf;imagesc(Vc(:,:,end));
2) Looping indefinitely
n = 1;
V = 0;
i = 1:31;
j = 1:50;
[I,J] = meshgrid(i,j);
while true
V = V + R*((1-(-1)^n)/(n^4 *pi^4)*sin((n*pi*c*J)/L).*sin((n*pi*I)/L));
n = n + 1;
figure(1);clf;
imagesc(V);
title(sprintf('N = %d',n))
drawnow;
pause(0.25);
end
Note that in your example you won't need many terms, since:
Every second term is zero (for even n, the term 1-(-1)^n is zero).
The terms decay with 1/n^4. In norms: n=1 contributes ~2e4, n=3 contributes ~4e2, n=5 contributes 5e1, n=7 contributes ~14, etc. Visually, there is a small difference between n=1 and n=1+n=3 but barely a noticeable one for n=1+n=3+n=5.
Given that so few terms are needed, the first approach is probably the better one. Also, skip the even indices, as you don't need them.

Solving for the square root by Newton's Method

yinitial = x
y_n approaches sqrt(x) as n->infinity
If theres an x input and tol input. Aslong as the |y^2-x| > tol is true compute the following equation of y=0.5*(y + x/y). How would I create a while loop that will stop when |y^2-x| <= tol. So every time through the loop the y value changes. In order to get this answer--->
>>sqrtx = sqRoot(25,100)
sqrtx =
7.4615
I wrote this so far:
function [sqrtx] = sqrRoot(x,tol)
n = 0;
x=0;%initialized variables
if x >=tol %skips all remaining code
return
end
while x <=tol
%code repeated during each loop
x = x+1 %counting code
end
That formula is using a modified version of Newton's method to determine the square root. y_n is the previous iteration and y_{n+1} is the current iteration. You just need to keep two variables for each, then when the criteria of tolerance is satisfied, you return the current iteration's output. You also are incrementing the wrong value. It should be n, not x. You also aren't computing the tolerance properly... read the question more carefully. You take the current iteration's output, square it, subtract with the desired value x, take the absolute value and see if the output is less than the tolerance.
Also, you need to make sure the tolerance is small. Specifying the tolerance to be 100 will probably not allow the algorithm to iterate and give you the right answer. It may also be useful to see how long it took to converge to the right answer. As such, return n as a second output to your function:
function [sqrtx,n] = sqrRoot(x,tol) %// Change
%// Counts total number of iterations
n = 0;
%// Initialize the previous and current value to the input
sqrtx = x;
sqrtx_prev = x;
%// Until the tolerance has been met...
while abs(sqrtx^2 - x) > tol
%// Compute the next guess of the square root
sqrtx = 0.5*(sqrtx_prev + (x/sqrtx_prev));
%// Increment the counter
n = n + 1;
%// Set for next iteration
sqrtx_prev = sqrtx;
end
Now, when I run this code with x=25 and tol=1e-10, I get this:
>> [sqrtx, n] = sqrRoot(25, 1e-10)
sqrtx =
5
n =
7
The square root of 25 is 5... at least that's what I remember from maths class back in the day. It also took 7 iterations to converge. Not bad.
Yes, that is exactly what you are supposed to do: Iterate using the equation for y_{n+1} over and over again.
In your code you should have a loop like
while abs(y^2 - x) > tol
%// Calculate new y from the formula
end
Also note that tol should be small, as told in the other answer. The parameter tol actually tells you how inaccurate you want your solution to be. Normally you want more or less accurate solutions, so you set tol to a value near zero.
The correct way to solve this..
function [sqrtx] = sqRoot(x,tol)
sqrtx = x;%output = x
while abs((sqrtx.^2) - x) > tol %logic expression to test when it should
end
sqrtx = 0.5*((sqrtx) + (x/sqrtx)); %while condition prove true calculate
end
end

Calculating Errors of the Trapezoidal Rule in MATLAB

I'm trying to calculate how the errors depend on the step, h, for the trapezoidal rule. The errors should get smaller with a smaller value of h, but for me this doesn't happen. This is my code:
Iref is a reference value calculated and verified with Simpson's method and the MATLAB function quad, respectively
for h = 0.01:0.1:1
x = a:h:b;
v = y(x);
Itrap = (sum(v)-v(1)/2-v(end)/2)*h;
Error = abs(Itrap-Iref)
end
I think there's something wrong with the way I'm using h, because the trapezoidal rule works for known integrals. I would be really happy if someone could help me with this, because I can't understand why the errors are "jumping around" the way the do.
I wonder if maybe part of the problem is that not all intervals - for each step size h - have the same a and b just because of the way that x is constructed. Try the following with the additional fprintf statement:
for h = 0.01:0.1:1
x = a:h:b;
fprintf('a=%f b=%f\n',x(1),x(end));
v = y(x);
Itrap = (sum(v)-v(1)/2-v(end)/2)*h;
Error = abs(Itrap-Iref);
end
Depending upon your a and b (I chose a=0 and b=5) all the a values were identical (as expected) but the b varied from 4.55 to 5.0.
I think that you always want to keep the interval [a,b] the same for each step size that you choose in order to get a better comparison between each iteration. So rather than iterating over the step size, you could instead iterate over the n, the number of equally spaced sub-intervals within [a,b].
Rather than
for h = 0.01:0.1:1
x = a:h:b;
you could do something more like
% iterate over each value of n, chosen so that the step size
% is similar to what you had before
for n = [501 46 24 17 13 10 9 8 7 6]
% create an equally spaced vector of n numbers between a and b
x = linspace(a,b,n);
% get the step delta
h = x(2)-x(1);
v = y(x);
Itrap = (sum(v)-v(1)/2-v(end)/2)*h;
Error = abs(Itrap-Iref);
fprintf('a=%f b=%f len=%d h=%f Error=%f\n',x(1),x(end),length(x),h,Error);
end
When you evaluate the above code, you will notice that a and b are consistent for each iteration, h is roughly what you chose before, and the Error does increase as the step size increases.
Try the above and see what happens!

Matlab -- random walk with boundaries, vectorized

Suppose I have a vector J of jump sizes and an initial starting point X_0. Also I have boundaries 0, B (assume 0 < X_0 < B). I want to do a random walk where X_i = [min(X_{i-1} + J_i,B)]^+. (positive part). Basically if it goes over a boundary, it is made equal to the boundary. Anyone know a vectorized way to do this? The current way I am doing it consists of doing cumsums and then finding places where it violates a condition, and then starting from there and repeating the cumsum calculation, etc until I find that I stop violating the boundaries. It works when the boundaries are rarely hit, but if they are hit all the time, it basically becomes a for loop.
In the code below, I am doing this across many samples. To 'fix' the ones that go out of the boundary, I have to loop through the samples to check...(don't think there is a vectorized 'find')
% X_init is a row vector describing initial resource values to use for
% each sample
% J is matrix where each col is a sequence of Jumps (columns = sample #)
% In this code the jumps are subtracted, but same thing
X_intvl = repmat(X_init,NumJumps,1) - cumsum(J);
X = [X_init; X_intvl];
for sample = 1:NumSamples
k = find(or(X_intvl(:,sample) > B, X_intvl(:,sample) < 0),1);
while(~isempty(k))
change = X_intvl(k-1,sample) - X_intvl(k,sample);
X_intvl(k:end,sample) = X_intvl(k:end,sample)+change;
k = find(or(X_intvl(:,sample) > B, X_intvl(:,sample) < 0),1);
end
end
Interesting question (+1).
I faced a similar problem a while back, although slightly more complex as my lower and upper bound depended on t. I never did work out a fully-vectorized solution. In the end, the fastest solution I found was a single loop which incorporates the constraints at each step. Adapting the code to your situation yields the following:
%# Set the parameters
LB = 0; %# Lower bound
UB = 5; %# Upper bound
T = 100; %# Number of observations
N = 3; %# Number of samples
X0 = (1/2) * (LB + UB); %# Arbitrary start point halfway between LB and UB
%# Generate the jumps
Jump = randn(N, T-1);
%# Build the constrained random walk
X = X0 * ones(N, T);
for t = 2:T
X(:, t) = max(min(X(:, t-1) + Jump(:, t-1), UB), 0);
end
X = X';
I would be interested in hearing if this method proves faster than what you are currently doing. I suspect it will be for cases where the constraint is binding in more than one or two places. I can't test it myself as the code you provided is not a "working" example, ie I can't just copy and paste it into Matlab and run it, as it depends on several variables for which example (or simulated) values are not provided. I tried adapting it myself, but couldn't get it to work properly?
UPDATE: I just switched the code around so that observations are indexed on columns and samples are indexed on rows, and then I transpose X in the last step. This will make the routine more efficient, since Matlab allocates memory for numeric arrays column-wise - hence it is faster when performing operations down the columns of an array (as opposed to across the rows). Note, you will only notice the speed-up for large N.
FINAL THOUGHT: These days, the JIT accelerator is very good at making single loops in Matlab efficient (double loops are still pretty slow). Therefore personally I'm of the opinion that every time you try and obtain a fully-vectorized solution in Matlab, ie no loops, you should weigh up whether the effort involved in finding a clever solution is worth the slight gains in efficiency to be made over an easier-to-obtain method that utilizes a single loop. And it is important to remember that fully-vectorized solutions are sometimes slower than solutions involving single loops when T and N are small!
I'd like to propose another vectorized solution.
So, first we should set the parameters and generate random Jumpls. I used the same set of parameters as Colin T Bowers:
% Set the parameters
LB = 0; % Lower bound
UB = 20; % Upper bound
T = 1000; % Number of observations
N = 3; % Number of samples
X0 = (1/2) * (UB + LB); % Arbitrary start point halfway between LB and UB
% Generate the jumps
Jump = randn(N, T-1);
But I changed generation code:
% Generate initial data without bounds
X = cumsum(Jump, 2);
% Apply bounds
Amplitude = UB - LB;
nsteps = ceil( max(abs(X(:))) / Amplitude - 0.5 );
for ii = 1:nsteps
ind = abs(X) > (1/2) * Amplitude;
X(ind) = Amplitude * sign(X(ind)) - X(ind);
end
% Shifting X
X = X0 + X;
So, instead of for loop I'm using cumsum function with smart post-processing.
N.B. This solution works significantly slower than Colin T Bowers's one for tight bounds (Amplitude < 5), but for loose bounds (Amplitude > 20) it works much faster.

How can I speed up this call to quantile in Matlab?

I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function, with the result that 2/3 of the computing time is used in the function levels:
The function levels takes a matrix of floats and splits each column into nLevels buckets, returning a matrix of the same size as the input, with each entry replaced by the number of the bucket it falls into.
To do this I use the quantile function to get the bucket limits, and a loop to assign the entries to buckets. Here's my implementation:
function [Y q] = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q)
q=transpose(q);
end
Y = zeros(size(X));
for i = 1:nLevels
% "The variables g and l indicate the entries that are respectively greater than
% or less than the relevant bucket limits. The line Y(g & l) = i is assigning the
% value i to any element that falls in this bucket."
if i ~= nLevels % "The default; doesnt include upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#lt,X,q(i+1,:));
else % "For the final level we include the upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#le,X,q(i+1,:));
end
Y(g & l) = i;
end
Is there anything I can do to speed this up? Can the code be vectorized?
If I understand correctly, you want to know how many items fell in each bucket.
Use:
n = hist(Y,nbins)
Though I am not sure that it will help in the speedup. It is just cleaner this way.
Edit : Following the comment:
You can use the second output parameter of histc
[n,bin] = histc(...) also returns an index matrix bin. If x is a vector, n(k) = >sum(bin==k). bin is zero for out of range values. If x is an M-by-N matrix, then
How About this
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
Y = zeros(size(X));
for i = 1:numel(q)-1
Y = Y+ X>=q(i);
end
This results in the following:
>>X = [3 1 4 6 7 2];
>>[Y, q] = levels(X,2)
Y =
1 1 2 2 2 1
q =
1 3.5 7
You could also modify the logic line to ensure values are less than the start of the next bin. However, I don't think it is necessary.
I think you shoud use histc
[~,Y] = histc(X,q)
As you can see in matlab's doc:
Description
n = histc(x,edges) counts the number of values in vector x that fall
between the elements in the edges vector (which must contain
monotonically nondecreasing values). n is a length(edges) vector
containing these counts. No elements of x can be complex.
I made a couple of refinements (including one inspired by Aero Engy in another answer) that have resulted in some improvements. To test them out, I created a random matrix of a million rows and 100 columns to run the improved functions on:
>> x = randn(1000000,100);
First, I ran my unmodified code, with the following results:
Note that of the 40 seconds, around 14 of them are spent computing the quantiles - I can't expect to improve this part of the routine (I assume that Mathworks have already optimized it, though I guess that to assume makes an...)
Next, I modified the routine to the following, which should be faster and has the advantage of being fewer lines as well!
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
Y = ones(size(X));
for i = 2:nLevels
Y = Y + bsxfun(#ge,X,q(i,:));
end
The profiling results with this code are:
So it is 15 seconds faster, which represents a 150% speedup of the portion of code that is mine, rather than MathWorks.
Finally, following a suggestion of Andrey (again in another answer) I modified the code to use the second output of the histc function, which assigns entries to bins. It doesn't treat the columns independently, so I had to loop over the columns manually, but it seems to be performing really well. Here's the code:
function [Y q] = levels(X,nLevels)
p = linspace(0,1,nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
q(end,:) = 2 * q(end,:);
Y = zeros(size(X));
for k = 1:size(X,2)
[junk Y(:,k)] = histc(X(:,k),q(:,k));
end
And the profiling results:
We now spend only 4.3 seconds in codes outside the quantile function, which is around a 500% speedup over what I wrote originally. I've spent a bit of time writing this answer because I think it's turned into a nice example of how you can use the MATLAB profiler and StackExchange in combination to get much better performance from your code.
I'm happy with this result, although of course I'll continue to be pleased to hear other answers. At this stage the main performance increase will come from increasing the performance of the part of the code that currently calls quantile. I can't see how to do this immediately, but maybe someone else here can. Thanks again!
You can sort the columns and divide+round the inverse indexes:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
[S,IX]=sort(X);
[grid1,grid2]=ndgrid(1:size(IX,1),1:size(IX,2));
invIX=zeros(size(X));
invIX(sub2ind(size(X),IX(:),grid2(:)))=grid1;
Y=ceil(invIX/size(X,1)*nLevels);
Or you can use tiedrank:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
R=tiedrank(X);
Y=ceil(R/size(X,1)*nLevels);
Surprisingly, both these solutions are slightly slower than the quantile+histc solution.