Sum numbers and finding maximum, Matlab

Sum numbers and finding maximum, Matlab - matlab

I want to sum the powers (r) of primlist (primes) that divides a number, and also pick out the greatest power (k) that makes the primlist divides a number. I believe I have the right concept but matlab is overlooking something in the loop. example when numbas=45, we know 3^2*5=45 so primlist = 3 and 5, for 3: k=2, r=1,3 and for 5, k=1, r=1. r is simply all the powers of a primlist that divides a numba, and k is the highest value of r. i want to sum all the r's and also get the maximum r which is k
n=100;
primlist=2;
for numba=1:n;
if mod(2+numba,primlist)~=0
primlist=[primlist;2+numba];
end
end
prims=reshape(primlist.',1,[]);
r=1;
for numbas=2:n
for k=1:10
if mod(numbas,prims.^k)==0
r=r+sum(k) % sum of all the powers of prims, such that prims divide numbas
k=max(k) % greatest power of prims, such that prims divide numbas
end
end
end
numbas
prims
k
r

I think this does what you are looking for. I added an extra loop to test whether primes divide numbas one-by-one, rather than all at once. r contains the primes that divide numbas and pow contains the corresponding numbers of each. You may want to change how the results are saved, but I think this gives you what you were after.
n=100;
primlist=2;
for numba=1:n;
if mod(2+numba,primlist)~=0
primlist=[primlist;2+numba];
end
end
kMax=0;
for numbas=2:n
r=zeros(size(primlist));
pow=r;
for k=1:10
for i=1:length(primlist)
if mod(numbas,primlist(i).^k)==0
r(i)=primlist(i); % sum of all the powers of prims, such that prims divide numbas
pow(i)=k;
if k>kMax,kMax=k;end % greatest power of prims, such that prims divide numbas
end
end
end
R=r'
POW=pow'
end

Related

Performance analysis of series summation in Matlab

I'm writing a Matlab program to compute pi through the summation of series
A = Sum of a_i from i=1 to N
where
pi/4 = 1 - 1/3 + 1/5 - 1/7 + 1/9 - 1/11 + 1/13 ...
To compute pi through the series summation, the suggested approach is to set
a_i = (-1)^(i+1)/(2i-1)
In order to do this, I wrote the program below
n=100;
f=[];
for jj=1:n
ii=1:jj;
f=[f 4*sum( ((-1).^(ii+1))./(2.*ii-1) )];
end;
hold on
plot(f)
title('Computing of \pi using a finite sum')
xlabel('Number of summations')
ylabel('Estimated value of \pi')
plot(1:size(f,2),ones(size(f))*pi)
This program shows that the series approximation is somewhat accurate near N=80.
I am now attempting to adjust my program so that the y-axis displays total calculation time T_N and the x-axis displays N (the number of summations). The total calculation time T_N should increase as N increases. Ideally, I am looking to have the graph display something close to a linear relationship between T(N) and N
To do this, I have adjusted my original program as follows
n=100;
f=[];
tic
for jj=1:n
ii=1:jj;
f=[f 4*sum( ((-1).^(ii+1))./(2.*ii-1) )];
end;
hold on
plot(f)
title('Time it takes to sum \pi using N summations')
xlabel('Number of summations (N)')
ylabel('Total Time (T_N)')
plot(1:size(f,2),toc)
slope = polyfit(1:size(f,2),toc,1);
This looks wrong. I must have incorrectly applied the built-in timing functions in Matlab (tic and toc). So, I am going to analyze my code and ask two questions -
How could I adjust my code above so that the y-axis correctly displays the total calculation time per the summation N? It looks like I did something wrong in plot(1:size(f,2),toc).
After I get the y-axis to display the correct total calculation time (T_N), I should be able to use the polyfit command to find the slope of T(N)/N. This will give me a linear relationship between T(N) and N. I could then use the value of slope = polyfit(1:size(f,2),toc,1) to compute
t_N = a + b*N
where t_N is computed for every value of N and b is the slope calculated through the polyfit command.
I think that I should be able to find the values of a and b after I correctly display the y-axis and correctly reference the polyfit command.

There are several things that can be improved in your code:
f should be preallocated, so as not to waste time in repeatedly assigning memory.
tic should be called within the loop, in order to restart the stopwatch timer.
When you call toc you get the current time from the last tic. The time spent should be stored in a vector (also preallocated).
Since the computations you want to time are very fast, measuring the time they take is very unrealiable. The computations should be repeated many times, so the measured time is larger and you get better accuracy. Even better would be to use timeit (see below).
You cannot plot the time and the results in the same figure, because the scales are too different.
The code incorporating these changes is:
n = 100;
f = NaN(1,n); % preallocate
times = NaN(1,n); % preallocate
repeat_factor = 1e4; % repeat computations for better time accuracy
for jj=1:n
tic % initiallize time count
for repeat = 1:repeat_factor % repeat many times for better time accuracy
ii=1:jj;
f(jj) = 4*sum( ((-1).^(ii+1))./(2.*ii-1) ); % store value
end
times(jj) = toc; % store time
end
times = times / repeat_factor; % divide by repeat factor
plot(f)
title('Time it takes to sum \pi using N summations')
xlabel('Number of summations (N)')
ylabel('Total Time (T_N)')
figure % new figure for time
plot(1:size(f,2), times)
p = polyfit(1:size(f,2),times,1);
slope = p(1);
Using timeit for measuring the time will probably give improved accuracy (but not very good because, as mentioned above, the computations you want to time are very fast). To use timeit you need to define a function with the code to be timed. The simplest way is to use an anonymous function without input arguments. See code below.
n = 100;
f = NaN(1,n); % preallocate
times = NaN(1,n); % preallocate
for jj=1:n
ii=1:jj;
fun = #() 4*sum( ((-1).^(ii+1))./(2.*ii-1) );
f(jj) = fun(); % store value
times(jj) = timeit(fun); % measure and store time
end
plot(f)
title('Time it takes to sum \pi using N summations')
xlabel('Number of summations (N)')
ylabel('Total Time (T_N)')
figure % new figure for time
plot(1:size(f,2), times)
p = polyfit(1:size(f,2),times,1);
slope = p(1);

If I understand your problem correctly, I think there are two different issues here. First, you plot your result function then the elapsed time which is several orders of magnitude smaller than pi:
hold on
plot(f) % <---- Comment me out!
...stuff...
plot(1:size(f,2),toc)
Secondly, you need to store the execution time of each pass of the loop:
n=100;
f=[];
telapsed = zeros(1,n);
tic
for jj=1:n
ii=1:jj;
f=[f 4*sum( ((-1).^(ii+1))./(2.*ii-1) )];
telapsed(jj) = toc;
end
hold on
% plot(f)
title('Time it takes to sum \pi using N summations')
xlabel('Number of summations (N)')
ylabel('Total Time (T_N)')
plot(1:n,telapsed)
slope = polyfit(1:n,telapsed,1);
Note the new polyfit expression for slope of the execution time. Does that help?

How to show that randperm() in matlab is fair

Suppose, I wanted to show (empirically) that randperm(n,k) from matlab indeed produces uniformly distributed random samples of size k from a set N of n elements. How can I plot the number of occurences divided by the total number of k-subsets drawn from N, after drawing repeatedly?

You can simply use the indices drawn from randperm to increment a counter vector.
n=1e5;
k=1e4;
maxiter = 1e5;
% This array will be used to count the number of times each integer has been drawn
count=zeros(n,1);
for ii=1:maxiter
p=randperm(n,k);
% p is a vector of k distinct integers in the 1:n range
% the array count will be incremented at indices given by p
count(p)=count(p)+1;
end
% A total of k*maxiter integers has been drawn and they should be evenly
% distributed over n values
% The following vector should have values close to 1 for large values of maxiter
prob = count*n/(k*maxiter);

How can I use Matlab to calculate the probability of getting at least $q$ zeros in a binary sequence?

For a project (not educationally) from a friend of mine I need to do some calculation. I have very little experience in matlab (and 7 years ago I had some lectures in java so I know what a for loop, etc is).
Question
Given a binary sequence S of length m , with n<m ones and (m-n) zeros (we don't know which elements have value 0 or 1).
Suppose we modify the sequence such that we randomly change x<m fixed elements of the sequence into a one or into a zero (with p = 0.5 1/2 for both events happening), what is the probability that the number of zeros is at least q < m ?
(By 'fixed elements' I mean that we know which elements of the sequence could be changed (flipping to the other value or remains the same))
So I'm looking for a method to calculate P(m,n,x,q) for example in settings with values like
m = 1000;
0 < n < 0.3m
0 < x < 0.6m;
0.5 m < q < m

It is not clear (at least for me), whether n, x and q are random numbers, or fixed values. I've written something as if they were random numbers. It's easily changeable if it's not the case.
Every line of the code is explained in comments (%) in the code.
%This is the number of bits
m=1000;
%This is a counter needed for the probability calculation
count=0;
%Loop needed for the probability calculation
for i = 1:1000000
%This is a random number generator that generates m (1000 in this case)
%random numbers between 0 and 1
randomseq=rand(1,m);
%This creates a random number between 0 and 1 and then multiplies it by
%0.29 so it can actually be between 0 and 0.29. If n is not a random
%number, then change it with the fixed value that it has.
n=rand(1,1)*0.29;
%This line is for the following:
%If a number is less than n, then make it 1, else make it 0 (this is
%needed so you can acquire a probability for the number of ones and the
%number of zeros). Here you will have around n*1000 ones, but not
%equal since randomseq is a random vector with 1000 elements and those
%are too few numbers so it can create a real distribution.
onezerorand=randomseq<n;
%That's why this is needed, just in case if the number of ones go over
%30%. If it goes over 30%, than make the numones coef 0.29
if sum(onezerorand)>=300
numones=0.29;
else
%This is a coef which represent the number of ones, but in probability
%terms (between 0 and 1). Mean just calculates the mean value between
%the zeros and the ones, which is essentially the probability.
numones=mean(onezerorand);
end
%The rest of the numbers are zero, so you get the probability by
%subtracting numones from 1
numzeros=1-numones;
%This creates a random number x between 0 and 0.59. If x is not a
%random number, the same comment goes as for n.
x=rand(1,1)*0.59;
%There is a 50% chance that x of the ones will be changed to zeros and
%vice versa, so you create a random number between 0 and 0.5 and
%multiply it by x.
p0=rand(1,1)*0.5;
p1=1-p0;
x_ones=rand(1,1)*p0*x;
x_zeros=rand(1,1)*p1*x;
%You add the changed zeros to the ones and the changed ones to the
%zeros and you subtract the changed ones from the ones and the changed
%zeros from the zeros, so you can get the last value of the number of
%zeros.
new_numones=numones-numones*x_ones+numzeros*x_zeros;
new_numzeros=numzeros-numzeros*x_zeros+numones*x_ones;
%You create a random number between 0.51 and 0.99, so you can compare
%the number of zeros to it. If q is not a random number, the same
%comment as the one for x and n.
q=rand(1,1)*0.48+0.51;
%You compare the number of zeros to the new random number. If it's
%bigger than it, than you raise the counter, if not, you do nothing.
%You repeat this 1000000 times so you can get a good probability
%estimation and the number you get is the probability of this occuring.
if new_numzeros>q
count=count+1;
end
end
%The probability of this event occuring
prob=count/1000000;
By my estimations, the probability is around 60%, with random coefs n, x and q. If they are fixed values, I would not know, since I don't know their exact values.

matrix dimensions wont agree when elements match/matlab

I am having a problem at the final calculation of my code, the very last part, where log is the natural log, I need RD=facs.*log(log(facs)) to divide sigmafac, or robin=sigmafac./RD. My RD goes from 1 to 100, so does my sigmafac. why is there a matrix dimension mismatch?
I want the corresponding number (numbas) of RD to divide the correspoding number of sigmafac, the all have the same dimension, so I do not see where the problem is coming from. I realize that RD(1)=-inf, is that is what causing the problem? and how do I fix it?
code:
n=100;
primlist=2; % starting the prime number list
for numba=1:n;
if mod(2+numba,primlist)~=0
primlist=[primlist;2+numba]; %generating the prime number list
end
end
fac=1; %initializing the factorials
RD=0;
for numbas=2:n
%preallocating vectors for later use
prims=zeros(size(primlist));
pprims=zeros(size(primlist));
pow=prims;
for i=1:length(primlist) % identifying each primes in the primlist
for k=1:10
if mod(numbas,primlist(i).^k)==0
prims(i)=primlist(i); % sum of all the powers of prims, such that prims divide numbas
pow(i)=k; % collecting the exponents of primes
end
end
if primlist(i)<=numbas
pprims(i)=primlist(i); % primes less than or equal to numbas
end
end
% converting column vectors to row vector
PPRIMS=pprims';
PRIMS=prims';
POW=pow';
%Creating the vectors
PLN(numbas,:)=PPRIMS; % vector of primes less than or equal to number
PPV(numbas,:)=PRIMS; % prime divisor vector
PVE(numbas,:)=POW; % highest power of each primes for every number
RVE=cumsum(PVE); % the cummulative sum of the exponents
RVE(RVE~=0)=RVE(RVE~=0)+1; %selects each non zero element then add 1
%factorial
fac=fac*numbas;
facs(numbas)=fac; %storing the factorials
if facs==1
RD==1; % log(log(facs1))) does not exist
else RD=facs.*log(log(facs));
end
end
% setting up sum of divisor vector
NV=PLN.^RVE-1; % numerator part of sum of divisors vector
DV=PLN-1; % denominator part of sum of divisors
NV(NV==0)=1; % getting rid of 0 for elementwise product
DV(DV==-1)=1; % getting rid of -1 for elementwise product
sigmafac=prod(NV,2)./prod(DV,2); %sum of divisors
robin=(sigmafac)./(RD)

Whenever you get such an error, your first check should be to test
size(sigmafac)
size(RD)
In this case, you'll get
ans =
100 1
ans =
1 100
So they are NOT the same size. You need to take the transpose of one or the other and then your division will work fine.

Your sigmafac is 100x1 but your RD is 1x100 which is producing the error. If you want this to work just change
robin=(sigmafac)./(RD)
to
robin=(sigmafac)'./(RD)
This will make sigmafac a 1x100 (transpose) and then your vectors will have the same dimension and you will be able to do the division.

Alternative to using squareform (Matlab)

At the moment i am using the pdist function in Matlab, to calculate the euclidian distances between various points in a three dimensional cartesian system. I'm doing this because i want to know which point has the smallest average distance to all the other points (the medoid). The syntax for pdist looks like this:
% calculate distances between all points
distances = pdist(m);
But because pdist returns a one dimensional array of distances, there is no easy way to figure out which point has the smallest average distance (directly). Which is why i am using squareform and then calculating the smallest average distance, like so:
% convert found distances to matrix of distances
distanceMatrix = squareform(distances);
% find index of point with smallest average distance
[~,j] = min(mean(distanceMatrix,2));
The distances are averaged for each column, and the variable j is the index for the column (and the point) with the smallest average distance.
This works, but squareform takes a lot of time (this piece of code is repeated thousands of times), so i am looking for a way to optimise it. Does anyone know of a faster way to deduce the point with the smallest average distance from the results of pdist?

I think for your task using SQUAREFORM function is the best way from vectorization view point. If you look at the content of this function by
edit squareform
You will see that it performs a lot of checks that take time of course. Since you know your input to squareform and can be sure it will work, you can create your custom function with just the core of squareform.
[r, c] = size(m);
distanceMatrix = zeros(r);
distanceMatrix(tril(true(r),-1)) = distances;
distanceMatrix = distanceMatrix + distanceMatrix';
Then run the same code as you did to find the medioid.

Here's an implementation that doesn't require a call to squareform:
N1 = 10;
dim = 5;
% generate points
X = randn(N1, dim);
% find mean distance
for iter=N1:-1:1
d_mean(iter) = mean(pdist2(X(iter,:),X([1:(iter-1) (iter+1):end],:),'euclidean'));
% D(iter,:) = pdist2(X(iter,:),X([1:(iter-1) (iter+1):end],:),'euclidean');
end
[val ind] = min(d_mean);
But without knowing more about your problem, I have no idea if it would be faster.
If this is the lynchpin for your program's performance, you may need to consider other speedup options like mex.
Good luck.

When pdist computes distances between pairs of observations (1,2,...,n), the distances are arranged in the following order:
(2,1), (3,1), ..., (m,1), (3,2), ..., (m,2), ..., (m,m–1))
To demonstrate this, try the following:
> X = [.2 .1 .7 .5]';
> D = pdist(X)
.1 .5 .3 .6 .4 .2
In this example, X stores n=4 observations. The result, D, is a vector of distances between observations (2,1), (3,1), (4,1), (3,2), (4,2), (5,4). This arrangement corresponds with the entries of the lower triangular part of the following n-by-n matrix:
M=
0 0 0 0
.1 0 0 0
.5 .6 0 0
.3 .4 .2 0
Notice that D(1)=M(2,1), D(2)=(3,1) and so on. So, one way to get the pair of indices in M that correspond with D(k) would be to compute the linear index of D(k) in M. This could be done as follows:
% matrix size
n = 4;
% r(j) is the no. of elements in cols 1..j, belonging to the upper triangular part
r = cumsum(1:n-1);
% p(j) is the no. elements in cols 1..j, belonging to the lower triangular part
p = cumsum(n-1:-1:1);
% The linear index of value D(k)
q = find(p >= k, 1);
% The subscript indices of value D(k)
[i j] = ind2sub([n n], k + r(q));
Notice that n, r and p need to be set only once. From that point, you can find the index for any given k using the last two lines. Let's check this:
for k = 1:6
q = find(p >= k, 1);
[i, j] = ind2sub([n n], k + r(q));
fprintf('D(%d) is the distance between observations (%d %d)\n', k, i, j);
end
Here's the output:
D(1) is the distance between observations (2 1)
D(2) is the distance between observations (3 1)
D(3) is the distance between observations (4 1)
D(4) is the distance between observations (3 2)
D(5) is the distance between observations (4 2)
D(6) is the distance between observations (4 3)

There is no need to use squareform:
distances = pdist(m);
l=length(distances);
n=(1+sqrt(1+4*l))/2;
m=[];
for i=1:n
idx=[1+i:n:length(distances)];
m(i)=mean(distances(idx));
end
j=min(m);
I am not sure, but maybe this can be vectorised as well, but now it is late.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Sum numbers and finding maximum, Matlab - matlab

Related

Performance analysis of series summation in Matlab

How to show that randperm() in matlab is fair

How can I use Matlab to calculate the probability of getting at least $q$ zeros in a binary sequence?

matrix dimensions wont agree when elements match/matlab

Alternative to using squareform (Matlab)

Categories

Resources