Running a program in matlab [duplicate] - matlab

I have gone through Google and Stack Overflow search, but nowhere I was able to find a clear and straightforward explanation for how to calculate time complexity.
What do I know already?
Say for code as simple as the one below:
char h = 'y'; // This will be executed 1 time
int abc = 0; // This will be executed 1 time
Say for a loop like the one below:
for (int i = 0; i < N; i++) {
Console.Write('Hello, World!!');
}
int i=0; This will be executed only once.
The time is actually calculated to i=0 and not the declaration.
i < N; This will be executed N+1 times
i++ This will be executed N times
So the number of operations required by this loop are {1+(N+1)+N} = 2N+2. (But this still may be wrong, as I am not confident about my understanding.)
OK, so these small basic calculations I think I know, but in most cases I have seen the time complexity as O(N), O(n^2), O(log n), O(n!), and many others.

How to find time complexity of an algorithm
You add up how many machine instructions it will execute as a function of the size of its input, and then simplify the expression to the largest (when N is very large) term and can include any simplifying constant factor.
For example, lets see how we simplify 2N + 2 machine instructions to describe this as just O(N).
Why do we remove the two 2s ?
We are interested in the performance of the algorithm as N becomes large.
Consider the two terms 2N and 2.
What is the relative influence of these two terms as N becomes large? Suppose N is a million.
Then the first term is 2 million and the second term is only 2.
For this reason, we drop all but the largest terms for large N.
So, now we have gone from 2N + 2 to 2N.
Traditionally, we are only interested in performance up to constant factors.
This means that we don't really care if there is some constant multiple of difference in performance when N is large. The unit of 2N is not well-defined in the first place anyway. So we can multiply or divide by a constant factor to get to the simplest expression.
So 2N becomes just N.

This is an excellent article: Time complexity of algorithm
The below answer is copied from above (in case the excellent link goes bust)
The most common metric for calculating time complexity is Big O notation. This removes all constant factors so that the running time can be estimated in relation to N as N approaches infinity. In general you can think of it like this:
statement;
Is constant. The running time of the statement will not change in relation to N.
for ( i = 0; i < N; i++ )
statement;
Is linear. The running time of the loop is directly proportional to N. When N doubles, so does the running time.
for ( i = 0; i < N; i++ ) {
for ( j = 0; j < N; j++ )
statement;
}
Is quadratic. The running time of the two loops is proportional to the square of N. When N doubles, the running time increases by N * N.
while ( low <= high ) {
mid = ( low + high ) / 2;
if ( target < list[mid] )
high = mid - 1;
else if ( target > list[mid] )
low = mid + 1;
else break;
}
Is logarithmic. The running time of the algorithm is proportional to the number of times N can be divided by 2. This is because the algorithm divides the working area in half with each iteration.
void quicksort (int list[], int left, int right)
{
int pivot = partition (list, left, right);
quicksort(list, left, pivot - 1);
quicksort(list, pivot + 1, right);
}
Is N * log (N). The running time consists of N loops (iterative or recursive) that are logarithmic, thus the algorithm is a combination of linear and logarithmic.
In general, doing something with every item in one dimension is linear, doing something with every item in two dimensions is quadratic, and dividing the working area in half is logarithmic. There are other Big O measures such as cubic, exponential, and square root, but they're not nearly as common. Big O notation is described as O ( <type> ) where <type> is the measure. The quicksort algorithm would be described as O (N * log(N )).
Note that none of this has taken into account best, average, and worst case measures. Each would have its own Big O notation. Also note that this is a VERY simplistic explanation. Big O is the most common, but it's also more complex that I've shown. There are also other notations such as big omega, little o, and big theta. You probably won't encounter them outside of an algorithm analysis course. ;)

Taken from here - Introduction to Time Complexity of an Algorithm
1. Introduction
In computer science, the time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the string representing the input.
2. Big O notation
The time complexity of an algorithm is commonly expressed using big O notation, which excludes coefficients and lower order terms. When expressed this way, the time complexity is said to be described asymptotically, i.e., as the input size goes to infinity.
For example, if the time required by an algorithm on all inputs of size n is at most 5n3 + 3n, the asymptotic time complexity is O(n3). More on that later.
A few more examples:
1 = O(n)
n = O(n2)
log(n) = O(n)
2 n + 1 = O(n)
3. O(1) constant time:
An algorithm is said to run in constant time if it requires the same amount of time regardless of the input size.
Examples:
array: accessing any element
fixed-size stack: push and pop methods
fixed-size queue: enqueue and dequeue methods
4. O(n) linear time
An algorithm is said to run in linear time if its time execution is directly proportional to the input size, i.e. time grows linearly as input size increases.
Consider the following examples. Below I am linearly searching for an element, and this has a time complexity of O(n).
int find = 66;
var numbers = new int[] { 33, 435, 36, 37, 43, 45, 66, 656, 2232 };
for (int i = 0; i < numbers.Length - 1; i++)
{
if(find == numbers[i])
{
return;
}
}
More Examples:
Array: Linear Search, Traversing, Find minimum etc
ArrayList: contains method
Queue: contains method
5. O(log n) logarithmic time:
An algorithm is said to run in logarithmic time if its time execution is proportional to the logarithm of the input size.
Example: Binary Search
Recall the "twenty questions" game - the task is to guess the value of a hidden number in an interval. Each time you make a guess, you are told whether your guess is too high or too low. Twenty questions game implies a strategy that uses your guess number to halve the interval size. This is an example of the general problem-solving method known as binary search.
6. O(n2) quadratic time
An algorithm is said to run in quadratic time if its time execution is proportional to the square of the input size.
Examples:
Bubble Sort
Selection Sort
Insertion Sort
7. Some useful links
Big-O Misconceptions
Determining The Complexity Of Algorithm
Big O Cheat Sheet

Several examples of loop.
O(n) time complexity of a loop is considered as O(n) if the loop variables is incremented / decremented by a constant amount. For example following functions have O(n) time complexity.
// Here c is a positive integer constant
for (int i = 1; i <= n; i += c) {
// some O(1) expressions
}
for (int i = n; i > 0; i -= c) {
// some O(1) expressions
}
O(nc) time complexity of nested loops is equal to the number of times the innermost statement is executed. For example, the following sample loops have O(n2) time complexity
for (int i = 1; i <=n; i += c) {
for (int j = 1; j <=n; j += c) {
// some O(1) expressions
}
}
for (int i = n; i > 0; i += c) {
for (int j = i+1; j <=n; j += c) {
// some O(1) expressions
}
For example, selection sort and insertion sort have O(n2) time complexity.
O(log n) time complexity of a loop is considered as O(log n) if the loop variables is divided / multiplied by a constant amount.
for (int i = 1; i <=n; i *= c) {
// some O(1) expressions
}
for (int i = n; i > 0; i /= c) {
// some O(1) expressions
}
For example, [binary search][3] has _O(log n)_ time complexity.
O(log log n) time complexity of a loop is considered as O(log log n) if the loop variables is reduced / increased exponentially by a constant amount.
// Here c is a constant greater than 1
for (int i = 2; i <=n; i = pow(i, c)) {
// some O(1) expressions
}
//Here fun is sqrt or cuberoot or any other constant root
for (int i = n; i > 0; i = fun(i)) {
// some O(1) expressions
}
One example of time complexity analysis
int fun(int n)
{
for (int i = 1; i <= n; i++)
{
for (int j = 1; j < n; j += i)
{
// Some O(1) task
}
}
}
Analysis:
For i = 1, the inner loop is executed n times.
For i = 2, the inner loop is executed approximately n/2 times.
For i = 3, the inner loop is executed approximately n/3 times.
For i = 4, the inner loop is executed approximately n/4 times.
…………………………………………………….
For i = n, the inner loop is executed approximately n/n times.
So the total time complexity of the above algorithm is (n + n/2 + n/3 + … + n/n), which becomes n * (1/1 + 1/2 + 1/3 + … + 1/n)
The important thing about series (1/1 + 1/2 + 1/3 + … + 1/n) is around to O(log n). So the time complexity of the above code is O(n·log n).
References:
1
2
3

Time complexity with examples
1 - Basic operations (arithmetic, comparisons, accessing array’s elements, assignment): The running time is always constant O(1)
Example:
read(x) // O(1)
a = 10; // O(1)
a = 1,000,000,000,000,000,000 // O(1)
2 - If then else statement: Only taking the maximum running time from two or more possible statements.
Example:
age = read(x) // (1+1) = 2
if age < 17 then begin // 1
status = "Not allowed!"; // 1
end else begin
status = "Welcome! Please come in"; // 1
visitors = visitors + 1; // 1+1 = 2
end;
So, the complexity of the above pseudo code is T(n) = 2 + 1 + max(1, 1+2) = 6. Thus, its big oh is still constant T(n) = O(1).
3 - Looping (for, while, repeat): Running time for this statement is the number of loops multiplied by the number of operations inside that looping.
Example:
total = 0; // 1
for i = 1 to n do begin // (1+1)*n = 2n
total = total + i; // (1+1)*n = 2n
end;
writeln(total); // 1
So, its complexity is T(n) = 1+4n+1 = 4n + 2. Thus, T(n) = O(n).
4 - Nested loop (looping inside looping): Since there is at least one looping inside the main looping, running time of this statement used O(n^2) or O(n^3).
Example:
for i = 1 to n do begin // (1+1)*n = 2n
for j = 1 to n do begin // (1+1)n*n = 2n^2
x = x + 1; // (1+1)n*n = 2n^2
print(x); // (n*n) = n^2
end;
end;
Common running time
There are some common running times when analyzing an algorithm:
O(1) – Constant time
Constant time means the running time is constant, it’s not affected by the input size.
O(n) – Linear time
When an algorithm accepts n input size, it would perform n operations as well.
O(log n) – Logarithmic time
Algorithm that has running time O(log n) is slight faster than O(n). Commonly, algorithm divides the problem into sub problems with the same size. Example: binary search algorithm, binary conversion algorithm.
O(n log n) – Linearithmic time
This running time is often found in "divide & conquer algorithms" which divide the problem into sub problems recursively and then merge them in n time. Example: Merge Sort algorithm.
O(n2) – Quadratic time
Look Bubble Sort algorithm!
O(n3) – Cubic time
It has the same principle with O(n2).
O(2n) – Exponential time
It is very slow as input get larger, if n = 1,000,000, T(n) would be 21,000,000. Brute Force algorithm has this running time.
O(n!) – Factorial time
The slowest!!! Example: Travelling salesman problem (TSP)
It is taken from this article. It is very well explained and you should give it a read.

When you're analyzing code, you have to analyse it line by line, counting every operation/recognizing time complexity. In the end, you have to sum it to get whole picture.
For example, you can have one simple loop with linear complexity, but later in that same program you can have a triple loop that has cubic complexity, so your program will have cubic complexity. Function order of growth comes into play right here.
Let's look at what are possibilities for time complexity of an algorithm, you can see order of growth I mentioned above:
Constant time has an order of growth 1, for example: a = b + c.
Logarithmic time has an order of growth log N. It usually occurs when you're dividing something in half (binary search, trees, and even loops), or multiplying something in same way.
Linear. The order of growth is N, for example
int p = 0;
for (int i = 1; i < N; i++)
p = p + 2;
Linearithmic. The order of growth is n·log N. It usually occurs in divide-and-conquer algorithms.
Cubic. The order of growth is N3. A classic example is a triple loop where you check all triplets:
int x = 0;
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
x = x + 2
Exponential. The order of growth is 2N. It usually occurs when you do exhaustive search, for example, check subsets of some set.

Loosely speaking, time complexity is a way of summarising how the number of operations or run-time of an algorithm grows as the input size increases.
Like most things in life, a cocktail party can help us understand.
O(N)
When you arrive at the party, you have to shake everyone's hand (do an operation on every item). As the number of attendees N increases, the time/work it will take you to shake everyone's hand increases as O(N).
Why O(N) and not cN?
There's variation in the amount of time it takes to shake hands with people. You could average this out and capture it in a constant c. But the fundamental operation here --- shaking hands with everyone --- would always be proportional to O(N), no matter what c was. When debating whether we should go to a cocktail party, we're often more interested in the fact that we'll have to meet everyone than in the minute details of what those meetings look like.
O(N^2)
The host of the cocktail party wants you to play a silly game where everyone meets everyone else. Therefore, you must meet N-1 other people and, because the next person has already met you, they must meet N-2 people, and so on. The sum of this series is x^2/2+x/2. As the number of attendees grows, the x^2 term gets big fast, so we just drop everything else.
O(N^3)
You have to meet everyone else and, during each meeting, you must talk about everyone else in the room.
O(1)
The host wants to announce something. They ding a wineglass and speak loudly. Everyone hears them. It turns out it doesn't matter how many attendees there are, this operation always takes the same amount of time.
O(log N)
The host has laid everyone out at the table in alphabetical order. Where is Dan? You reason that he must be somewhere between Adam and Mandy (certainly not between Mandy and Zach!). Given that, is he between George and Mandy? No. He must be between Adam and Fred, and between Cindy and Fred. And so on... we can efficiently locate Dan by looking at half the set and then half of that set. Ultimately, we look at O(log_2 N) individuals.
O(N log N)
You could find where to sit down at the table using the algorithm above. If a large number of people came to the table, one at a time, and all did this, that would take O(N log N) time. This turns out to be how long it takes to sort any collection of items when they must be compared.
Best/Worst Case
You arrive at the party and need to find Inigo - how long will it take? It depends on when you arrive. If everyone is milling around you've hit the worst-case: it will take O(N) time. However, if everyone is sitting down at the table, it will take only O(log N) time. Or maybe you can leverage the host's wineglass-shouting power and it will take only O(1) time.
Assuming the host is unavailable, we can say that the Inigo-finding algorithm has a lower-bound of O(log N) and an upper-bound of O(N), depending on the state of the party when you arrive.
Space & Communication
The same ideas can be applied to understanding how algorithms use space or communication.
Knuth has written a nice paper about the former entitled "The Complexity of Songs".
Theorem 2: There exist arbitrarily long songs of complexity O(1).
PROOF: (due to Casey and the Sunshine Band). Consider the songs Sk defined by (15), but with
V_k = 'That's the way,' U 'I like it, ' U
U = 'uh huh,' 'uh huh'
for all k.

For the mathematically-minded people: The master theorem is another useful thing to know when studying complexity.

O(n) is big O notation used for writing time complexity of an algorithm. When you add up the number of executions in an algorithm, you'll get an expression in result like 2N+2. In this expression, N is the dominating term (the term having largest effect on expression if its value increases or decreases). Now O(N) is the time complexity while N is dominating term.
Example
For i = 1 to n;
j = 0;
while(j <= n);
j = j + 1;
Here the total number of executions for the inner loop are n+1 and the total number of executions for the outer loop are n(n+1)/2, so the total number of executions for the whole algorithm are n + 1 + n(n+1/2) = (n2 + 3n)/2.
Here n^2 is the dominating term so the time complexity for this algorithm is O(n2).

Other answers concentrate on the big-O-notation and practical examples. I want to answer the question by emphasizing the theoretical view. The explanation below is necessarily lacking in details; an excellent source to learn computational complexity theory is Introduction to the Theory of Computation by Michael Sipser.
Turing Machines
The most widespread model to investigate any question about computation is a Turing machine. A Turing machine has a one dimensional tape consisting of symbols which is used as a memory device. It has a tapehead which is used to write and read from the tape. It has a transition table determining the machine's behaviour, which is a fixed hardware component that is decided when the machine is created. A Turing machine works at discrete time steps doing the following:
It reads the symbol under the tapehead.
Depending on the symbol and its internal state, which can only take finitely many values, it reads three values s, σ, and X from its transition table, where s is an internal state, σ is a symbol, and X is either Right or Left.
It changes its internal state to s.
It changes the symbol it has read to σ.
It moves the tapehead one step according to the direction in X.
Turing machines are powerful models of computation. They can do everything that your digital computer can do. They were introduced before the advent of digital modern computers by the father of theoretical computer science and mathematician: Alan Turing.
Time Complexity
It is hard to define the time complexity of a single problem like "Does white have a winning strategy in chess?" because there is a machine which runs for a single step giving the correct answer: Either the machine which says directly 'No' or directly 'Yes'. To make it work we instead define the time complexity of a family of problems L each of which has a size, usually the length of the problem description. Then we take a Turing machine M which correctly solves every problem in that family. When M is given a problem of this family of size n, it solves it in finitely many steps. Let us call f(n) the longest possible time it takes M to solve problems of size n. Then we say that the time complexity of L is O(f(n)), which means that there is a Turing machine which will solve an instance of it of size n in at most C.f(n) time where C is a constant independent of n.
Isn't it dependent on the machines? Can digital computers do it faster?
Yes! Some problems can be solved faster by other models of computation, for example two tape Turing machines solve some problems faster than those with a single tape. This is why theoreticians prefer to use robust complexity classes such as NL, P, NP, PSPACE, EXPTIME, etc. For example, P is the class of decision problems whose time complexity is O(p(n)) where p is a polynomial. The class P do not change even if you add ten thousand tapes to your Turing machine, or use other types of theoretical models such as random access machines.
A Difference in Theory and Practice
It is usually assumed that the time complexity of integer addition is O(1). This assumption makes sense in practice because computers use a fixed number of bits to store numbers for many applications. There is no reason to assume such a thing in theory, so time complexity of addition is O(k) where k is the number of bits needed to express the integer.
Finding The Time Complexity of a Class of Problems
The straightforward way to show the time complexity of a problem is O(f(n)) is to construct a Turing machine which solves it in O(f(n)) time. Creating Turing machines for complex problems is not trivial; one needs some familiarity with them. A transition table for a Turing machine is rarely given, and it is described in high level. It becomes easier to see how long it will take a machine to halt as one gets themselves familiar with them.
Showing that a problem is not O(f(n)) time complexity is another story... Even though there are some results like the time hierarchy theorem, there are many open problems here. For example whether problems in NP are in P, i.e. solvable in polynomial time, is one of the seven millennium prize problems in mathematics, whose solver will be awarded 1 million dollars.

Related

Approximate pi using finitie series

Given the equation to approximate pi
I need to the number of terms (n) that are needed to obtain an approximation that is within 10^(-12) of the actual value of pi. The code I have to find the n looks like this:
The while loop statement I have seems to never end, so I feel like my code must be wrong.
Try something along these lines (transcribed from your image), incrementing the number of approximation terms n inside your infinite while loop:
s = 1
n = 1
while true
s = abs(pi - approximate_pi(n))
if s <= 0.001
break
end
n = n + 1
end
On a related note, this calculation is a little bit pointless if you know the value of pi beforehand. Termination condition should be on the absolute magnitude of the n-th term.
The way you're doing it makes sense only if you're trying to find out minimum n for which your approximation series produces the result within some margin of error.
Edit. So, normally you would do it like this:
n = 1;
sum_running = 0
sum_target = (pi^2 - 8) / 16;
while true
sum_running += 1 / ((2*n-1)^2 * (2*n+1)^2);
if abs(sum_target - sum_running) <= 10e-12
break
end
n += 1;
end
pi_approx = sqrt(16*sum_running + 8)
There's no need to keep recalculating pi approximation up to n terms, for each new n. This is has O(n) complexity, while your initial solution had O(n^2), so it's much faster for large n.

Approximating an integral in MATLAB

I've been trying to implement the following integral in MATLAB
Given a number n, I wrote the code that returns an array with n elements, containing approximations of each integral.
First, I tried this using a 'for' loop and the recurrence relationship on the first line. But from the 20th integral and above the values are completely wrong (correct to 0 significant figures and wrong sign).
The same goes if I use the explicit formula on the second line and two 'for' loops.
As n grows larger, so does the error on the approximations.
So the main issue here is that I haven't found a way to minimize the error as much as possible.
Any ideas? Thanks in advance.
Here is an example of the code and the resulting values, using the second formula:
This integral, for positive values of n, cannot have values >1 or <0
First attempt:
I tried the iterative method and found interesting thing. The approximation may not be true for all n. In fact if I keep track of (n-1)*I(n-1) in each loop I can see
I = zeros(20,3);
I(1,1) = 1-1/exp(1);
for ii = 2:20
I(ii,2) = ii-1;
I(ii,3) = (ii-1)*I(ii-1,1);
I(ii,1) = 1-I(ii,3);
end
There is some problem around n=18. In fact, I18 = 0.05719 and 18*I18 = 1.029 which is larger than 1. I don't think there is any numerical error or number overflow in this procedure.
Second attempt:
To make sure the maths is correct (I verified twice on paper) I used trapz to numerically evaluate the integral, and n=18 didn't cause any problem.
>> x = linspace(0,1,1+1e4);
>> f = #(n) exp(-1)*exp(x).*x.^(n-1);
>> f = #(n) exp(-1)*exp(x).*x.^(n-1)*1e-4;
>> trapz(f(5))
ans =
1.708934160520510e-01
>> trapz(f(17))
ans =
5.571936009790170e-02
>> trapz(f(18))
ans =
5.277113416899408e-02
>>
A closer look is as follows. I18 is slightly different (to the 4th significant digit) between the (stable) numerical method and (unstable) iterative method. 18*I18 is therefore possible to exceed 1.
I = zeros(20,3);
I(1,1) = 1-1/exp(1);
for ii = 2:20
I(ii,2) = ii-1;
I(ii,3) = (ii-1)*I(ii-1,1);
I(ii,1) = 1-I(ii,3);
end
J = zeros(20,3);
x = linspace(0,1,1+1e4);
f = #(n) exp(-1)*exp(x).*x.^(n-1)*1e-4;
J(1,1) = trapz(f(1));
for jj = 2:20
J(jj,1) = trapz(f(jj));
J(jj,2) = jj-1;
J(jj,3) = (jj-1)*J(jj-1,1);
end
I suspect there is an error in each iterative step due to the nature of numerical computations. If the iteration is long, the error propagates and, unfortunately in this case, amplifies rapidly. In order to verify this, I combined the above two methods into a hybrid algo. For most of the time the iterative way is used, and once in a while a numerical integral is evaluated from scratch without relying on previous iterations.
K = zeros(40,4);
K(1,1) = 1-1/exp(1);
for kk = 2:40
K(kk,2) = trapz(f(kk));
K(kk,3) = (kk-1)*K(kk-1,1);
K(kk,4) = 1-K(kk,3);
if mod(kk,5) == 0
K(kk,1) = K(kk,2);
else
K(kk,1) = K(kk,4);
end
end
If the iteration lasts more than 4 steps, error amplification will be large enough to invert the sign, and starts nonrecoverable oscillation.
The code should be able to explain all the data structures. Anyway, let me put some focus here. The second column is the result of trapz, which is the numerical integral done on the non-iterative integration definition of I(n). The third column is (n-1)*I(n-1) and should be always positive and less than 1. The forth column is 1-(n-1)*I(n-1) and should always be positive. The first column is the choice I have made between the trapz result and iterative result, to be the "true" value of I(n).
As can be seen here, in each iteration there is a small error compared to the independent numerical way. The error grows in the 3rd and 4th iteration and finally breaks the thing in its 5th. This is observed around n=25, under the case that I pick the numerical result in every 5 loops since the beginning.
Conclusion: There is nothing wrong with any definition of this integral. However the numerical error when evaluating the expressions is unfortunately aggregating, hence limiting the way you can perform the computation.

Simulating the exponential function with pertubation

for some simulations, I need to make use of an approximation of the exponential function. Now, the problem that I have is that:
function s=expone(N,k)
s=0
for j=1:k
s=s+(exp(-N+j*log(N)-log(factorial(j))));
end
end
is a pretty stable, in the sense that it is almost 1 for k large enough. However, as soon as N is bigger than 200, it quickly drops to zero. How can I improve that, I need large N. I cannot really change the mathematical why of writing this, since I have an additional pertubation, my final code will look something lie:
function s=expone(N,k)
s=0
for j=1:k
s=s+(exp(-N+j*log(N)-log(factorial(j))))*pertubation(N,k);
end
end
THe pertubation is between 0 and 1, so that's no problem, but the prefactor seems not to work for N>200. Can anyone help?
Thanks a lot!
The function log(x) - x has positive and negative part
Graphic in Wolframalpha
while x - log(x!) is negative for x>= 0
Graphic in Wolframalpha
So the problem arise when (N - log(N) ) is much greater than (j - log(j) ). So the solution is to choose a j much bigger than N . Exp(negative) tends to zero
for example expone(20,1) = 7.1907e-05 but expone(20,20) = 0.5591 and expone (20,50) = 1.000
As conclusion, if you want to work with N big, j should be bigger, and as an extra tip you may want to change you function to avoid for loops:
function s = expone(N,k)
j = 1:k;
s = sum ((exp(-N+j*log(N)-log(factorial(j)))));
end

Fast way to compute (1:N)'*(1:N)

I am looking for a fast way to compute
(1:N)'*(1:N)
for reasonably large N. I feel like the symmetry of the problem makes it so that actually doing the multiplications and additions is wasteful.
The question of why you want to do this really matters.
In the theoretical sense, the triangular approach suggested in the other answers will save you operations. #jgmao's answer is especially interesting in reducing multiplies.
In the practical sense, number of CPU operations is no longer the metric to minimize when writing fast code. Memory bandwidth dominates when you have so few CPU operations, so tuned cache-aware access patterns are how to make this go fast. Matrix multiplication code is implemented extremely efficiently, since it's such a common operation, and every implementation of the BLAS numeric library worth its salt will use optimized access patterns, and SIMD computation as well.
Even if you wrote straight C and reduced your op count to the theoretic minimum, you'd probably still not beat the full matrix multiply. What this boils down to is to find the numeric primitive which most closely matches your operation.
All that said, there's a BLAS operation which gets a little closer than DGEMM (matrix multiply). It's called DSYRK, the rank-k update, and it can be used for exactly A'*A. The MEX function I wrote for this a long time ago is here. I haven't messed with it in a long time, but it did work when I first wrote it, and did in fact run faster than a straight A'*A.
/* xtrx.c: calculates x'*x taking advantage of the symmetry.
Peter Boettcher <email removed>
Last modified: <Thu Jan 23 13:53:02 2003> */
#include "mex.h"
const double one = 1;
const double zero = 0;
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *x, *z;
int i, j, mrows, ncols;
if(nrhs!=1) mexErrMsgTxt("One input required.");
x = mxGetPr(prhs[0]);
mrows = mxGetM(prhs[0]);
ncols = mxGetN(prhs[0]);
plhs[0] = mxCreateDoubleMatrix(ncols,ncols, mxREAL);
z = mxGetPr(plhs[0]);
/* Call the FORTRAN BLAS routine for rank k update */
dsyrk_("U", "T", &ncols, &mrows, &one, x, &mrows, &zero, z, &ncols);
/* Result is in the upper triangle. Copy it down the lower part */
for(i=0; i<ncols; i++)
for(j=i+1; j<ncols; j++)
z[i*ncols + j] = z[j*ncols + i];
}
MATLAB's matrix multiplication is generally pretty fast, but here are a couple of ways to get just the upper triangular matrix. They are slower than naïvely computing the v'*v (or using a MEX wrapper that calls the more appropriate symmetric rank k update function in BLAS, not surprisingly!). Anyway, here are a few MATLAB-only solutions:
The first uses linear indexing:
% test vector
N = 1e3;
v = 1:N;
% compute upper triangle of product
[ii, jj] = find(triu(ones(N)));
upperMask = false(N,N);
upperMask(ii + N*(jj-1)) = true;
Mu = zeros(N);
Mu(upperMask) = v(ii).*v(jj); % other lines always the same computation
% validate
M = v'*v;
isequal(triu(M),Mu)
This next way won't be faster than the naive approach either, but here's another solution to compute the lower triangle with bsxfun:
Ml = bsxfun(#(x,y) [zeros(y-1,1); x(y:end)*y],v',v);
For the upper triangle:
Mu = bsxfun(#(x,y) [x(1:y)*y; zeros(numel(x)-y,1)],v',v);
isequal(triu(M),Mu)
Another solution for the whole matrix using cumsum for this special case (where v=1:N). This one is actually close in speed.
M = cumsum(repmat(v,[N 1]));
Maybe these can be a starting point for something better.
This is 3 times faster than (1:N).'*(1:N) provided an int32 result is acceptable (it's even faster if the numbers are small enough to use int16 instead of int32):
N = 1000;
aux = int32(1:N);
result = bsxfun(#times,aux.',aux);
Benchmarking:
>> N = 1000; aux = int32(1:N); tic, for count = 1:1e2, bsxfun(#times,aux.',aux); end, toc
Elapsed time is 0.734992 seconds.
>> N = 1000; aux = 1:N; tic, for count = 1:1e2, aux.'*aux; end, toc
Elapsed time is 2.281784 seconds.
Note that aux.'*aux cannot be used for aux = int32(1:N).
As pointed out by #DanielE.Shub, if the result is needed as a double matrix, a final cast has to be done, and in that case the gain is very small:
>> N = 1000; aux = int32(1:N); tic, for count = 1:1e2, double(bsxfun(#times,aux.',aux)); end, toc
Elapsed time is 2.173059 seconds.
Since the special ordered structure of the input, consider the case N=4
(1:4)'*(1:4) = [1 2 3 4
2 4 6 8
3 6 9 12
4 8 12 16]
you will find that 1st row is just (1:N), from second (j=2) row, the value of this row is previous row (j=1) plus (1:N).
So 1. you do not to do many multiplications. Instead, you can generate it by N*N additions.
2. since the output is symmetric, only half of the output matrix need to be computed. So the total computation is (N-1)+(N-2)+...+1 = N^2 / 2 additions.

Matlab Code To Approximate The Exponential Function

Does anyone know how to make the following Matlab code approximate the exponential function more accurately when dealing with large and negative real numbers?
For example when x = 1, the code works well, when x = -100, it returns an answer of 8.7364e+31 when it should be closer to 3.7201e-44.
The code is as follows:
s=1
a=1;
y=1;
for k=1:40
a=a/k;
y=y*x;
s=s+a*y;
end
s
Any assistance is appreciated, cheers.
EDIT:
Ok so the question is as follows:
Which mathematical function does this code approximate? (I say the exponential function.) Does it work when x = 1? (Yes.) Unfortunately, using this when x = -100 produces the answer s = 8.7364e+31. Your colleague believes that there is a silly bug in the program, and asks for your assistance. Explain the behaviour carefully and give a simple fix which produces a better result. [You must suggest a modification to the above code, or it's use. You must also check your simple fix works.]
So I somewhat understand that the problem surrounds large numbers when there is 16 (or more) orders of magnitude between terms, precision is lost, but the solution eludes me.
Thanks
EDIT:
So in the end I went with this:
s = 1;
x = -100;
a = 1;
y = 1;
x1 = 1;
for k=1:40
x1 = x/10;
a = a/k;
y = y*x1;
s = s + a*y;
end
s = s^10;
s
Not sure if it's completely correct but it returns some good approximations.
exp(-100) = 3.720075976020836e-044
s = 3.722053303838800e-044
After further analysis (and unfortunately submitting the assignment), I realised increasing the number of iterations, and thus increasing terms, further improves efficiency. In fact the following was even more efficient:
s = 1;
x = -100;
a = 1;
y = 1;
x1 = 1;
for k=1:200
x1 = x/200;
a = a/k;
y = y*x1;
s = s + a*y;
end
s = s^200;
s
Which gives:
exp(-100) = 3.720075976020836e-044
s = 3.720075976020701e-044
As John points out in a comment, you have an error inside the loop. The y = y*k line does not do what you need. Look more carefully at the terms in the series for exp(x).
Anyway, I assume this is why you have been given this homework assignment, to learn that series like this don't converge very well for large values. Instead, you should consider how to do range reduction.
For example, can you use the identity
exp(x+y) = exp(x)*exp(y)
to your advantage? Suppose you store the value of exp(1) = 2.7182818284590452353...
Now, if I were to ask you to compute the value of exp(1.3), how would you use the above information?
exp(1.3) = exp(1)*exp(0.3)
But we KNOW the value of exp(1) already. In fact, with a little thought, this will let you reduce the range for an exponential down to needing the series to converge rapidly only for abs(x) <= 0.5.
Edit: There is a second way one can do range reduction using a variation of the same identity.
exp(x) = exp(x/2)*exp(x/2) = exp(x/2)^2
Thus, suppose you wish to compute the exponential of large number, perhaps 12.8. Getting this to converge acceptably fast will take many terms in the simple series, and there will be a great deal of subtractive cancellation happening, so you won't get good accuracy anyway. However, if we recognize that
12.8 = 2*6.4 = 2*2*3.2 = ... = 16*0.8
then IF you could efficiently compute the exponential of 0.8, then the desired value is easy to recover, perhaps by repeated squaring.
exp(12.8)
ans =
362217.449611248
a = exp(0.8)
a =
2.22554092849247
a = a*a;
a = a*a;
a = a*a;
a = a*a
362217.449611249
exp(0.8)^16
ans =
362217.449611249
Note that WHENEVER you do range reduction using methods like this, while you may incur numerical problems due to the additional computations necessary, you will usually come out way ahead due to the greatly enhanced convergence of your series.
Why do you think that's the wrong answer? Look at the last term of that sequence, and it's size, and tell me why you expect you should have an answer that's close to 0.
My original answer stated that roundoff error was the problem. That will be a problem with this basic approach, but why do you think 40 is enough terms for the appropriate mathematical ( as opposed to computer floating point arithmetic) answer.
100^40 / 40! ~= 10^31.
Woodchip has the right idea with range reduction. That's the typical approach people use to implement these kinds of functions very quickly. Once you get that all figured out, you deal with roundoff errors of alternating sequences, by summing adjacent terms within the loop, and stepping with k = 1 : 2 : 40 (for instance). That doesn't work here until you use woodchips's idea because for x = -100, the summands grow for a very long time. You need |x| < 1 to guarantee intermediate terms are shrinking, and thus a rewrite will work.