I want to count the number of values in the array. I have a code which works:
Range = [1:10^3];% [1:10^6];
N = 10^2;% 10^8
Data = randi([Range(1),Range(end)],N,1);
Counts = nan(numel(Range),1);
for iRange = 1:numel(Range)
Counts(iRange) = sum(Data==Range(iRange));
end
Could you help me to make this code faster?
I feel that it should be via unique or hist, but I could not find a solution.
N = histcounts(Data,Range)
gives me 999 numbers instead of 1000.
As Ander Biguri stated at a comment, histcounts is what you seek.
The function counts the number of values of X (Data in your example), are found at every bin between two edges, where bins defined as such:
The value X(i) is in the kth bin if edges(k) ≤ X(i) < edges(k+1).
While the last bin also includes the right edges.
This means:
For N values, you need N+1 edges.
Each bin should start at the value you want it to include (1 between 1:2, 2 between 2:3, etc).
In your example:
Counts = histcounts(Data,Range(1):(Range(end)+1))';
I wanted to point out an issue with this code:
Counts = nan(numel(Range),1);
for iRange = 1:numel(Range)
Counts(iRange) = sum(Data==Range(iRange));
end
It shows a single loop, but == and sum work over all elements in the array, making this really expensive compared to a loop that doesn't do so, especially if N is large:
Counts = zeros(numel(Range),1);
for elem = Data(:).'
Counts(elem) = Counts(elem) + 1;
end
Related
I would like to generate an array which contains all ordered samples of length k taken from a set of n elements {a_1,...,a_n}, that is all the k-tuples (x_1,...,x_k) where each x_j can be any of the a_i (repetition of elements is allowed), and whose total number is n^k.
Is there a built-in function in Matlab to obtain it?
I have tried to write a code that iteratively uses the datasample function, but I couldn't get what desired so far.
An alternative way to get all the tuples is based on k-base integer representation.
If you take the k-base representation of all integers from 0 to n^k - 1, it gives you all possible set of k indexes, knowing that these indexes start at 0.
Now, implementing this idea is quite straightforward. You can use dec2base if k is lower than 10:
X = A(dec2base(0:(n^k-1), k)-'0'+1));
For k between 10 and 36, you can still use dec2base but you must take care of letters as there is a gap in ordinal codes between '9' and 'A':
X = A(dec2base(0:(n^k-1), k)-'0'+1));
X(X>=17) = X(X>=17)-7;
Above 36, you must use a custom made code for retrieving the representation of the integer, like this one. But IMO you may not need this as 2^36 is quite huge.
What you are looking for is ndgrid: it generates the grid elements in any dimension.
In the case k is fixed at the moment of coding, get all indexes of all elements a this way:
[X_1, ..., X_k] = ndgrid(1:n);
Then build the matrix X from vector A:
X = [A(X_1(:)), ..., A(X_k(:))];
If k is a parameter, my advice would be to look at the code of ndgrid and adapt it in a new function so that the output is a matrix of values instead of storing them in varargout.
What about this solution, I don't know if it's as fast as yours, but do you think is correct?
function Y = ordsampwithrep(X,K)
%ordsampwithrep Ordered samples with replacement
% Generates an array Y containing in its rows all ordered samples with
% replacement of length K with elements of vector X
X = X(:);
nX = length(X);
Y = zeros(nX^K,K);
Y(1,:) = datasample(X,K)';
k = 2;
while k < nX^K +1
temprow = datasample(X,K)';
%checknew = find (temprow == Y(1:k-1,:));
if not(ismember(temprow,Y(1:k-1,:),'rows'))
Y(k,:) = temprow;
k = k+1;
end
end
end
How can Find all numbers (e.g. 145= 1! + 4! + 5! = 1 + 24 + 120 = 145.)
which are equal to the sum of the factorial of their digits, by MATLAB?
I want to chop off digits, add the factorial of the digits together and compare it with the original number. If factorial summation be equal to original number, this numbers is on of the solution and must be keep. I can't code my idea, How can I code it? Is this true?
Thanks
The main reason that I post this answer is that I can't leave the use of eval in the previous answer without a decent alternative
Here is a small function to check this for any given (integer) n:
isFact = #(n) n==sum(factorial(int2str(n)-'0'));
Explanation:
int2str(n)-'0': "chop off digits"
sum(factorial(...)): "add the factorial of the digits together"
n==...: "compare it with the original number"
You can now plug it in a loop to find all the numbers between 1 to maxInt:
maxInt = 100000; % just for the example
solution = false(1,maxInt); % preallocating memory
for k = 1:maxInt
solution(k) = isFact(k);
end
find(solution) % find all the TRUE indices
The result:
ans =
1 2 145 40585
The loop above was written to be simple. If you look for further efficiency and flexibility (like not checking all the numbers between 1 to maxInt and checking array in any shape), you can change it to:
% generating a set of random numbers with no repetitions:
Vec2Check = unique(randi(1000,1,1000)); % you can change that to any array
for k = 1:numel(Vec2Check)
if isFact(Vec2Check(k))
Vec2Check(k) = Vec2Check(k)+0.1;
end
end
solution = Vec2Check(Vec2Check>round(Vec2Check))-0.1
The addition of 0.1 serves as a 'flag' that marks the numbers that isFact returns true for them. We then extract them by comparing the vector to it's rounded vertsion.
You can even go with a one-line solution:
solution = nonzeros(arrayfun(#(n) n.*(n==sum(factorial(int2str(n)-'0'))),Vec2Check))
The following snippet finds the numbers up to 1000 satisfying this condition.
numbers = [];
for i=1:1000
number_char = int2str(i);
sum = 0;
for j=1:length(number_char)
sum = sum+ factorial(eval(number_char(j)));
end
if (sum == i)
numbers(end+1) = i;
end
end
disp(numbers)
This should yield:
1 2 145
Note that if (log10(n)+1)*9! is less than n, then there is no number satisfying the condition larger than n.
I have two lists of 2-dimensional points given as M x 2 - and N x 2 - matrices, respectively, with M and N possibly being very large.
What is the fastest way to determine how many of them are equal?
I am not sure whether you want to count repetitive entries, but if not you could use intersect or some quite intuitive algorithm based on sorting (see below). I would not prefer a nested-loop version...
function test_compareVecs()
%% create some random data
N = 31415;
M1 = 100000;
M2 = 200000;
vec = rand(N,2);
v1 = [rand(M1-N,2); vec];
v2 = [rand(M2-N,2); vec];
v1 = v1(randperm(M1),:);
v2 = v2(randperm(M2),:);
%% intersect
disp('intersect:');
tic
s = size(intersect(v1,v2,'rows'),1);
toc;
s
%% alternative approach
disp('alternative approach:');
tic;
s = compareVecs(v1,v2);
toc;
s
end
function s = compareVecs(v1,v2)
%% create help vector
help_vec = [[v1,zeros(size(v1,1),1)]; ...
[v2,ones(size(v2,1),1)]];
%% sort by first column
% note: for some reason "sortrows(help_vec,1)" is slower
hash_vec = help_vec(:,1); % dummy hash
[~,sidx] = sort(hash_vec);
help_vec = help_vec(sidx,:);
%% diff + compare
help_vec = diff(help_vec);
s = sum(help_vec(:,1) == 0 & ...
help_vec(:,2) == 0 & ...
help_vec(:,3) ~= 0);
end
Result
intersect:
Elapsed time is 0.145717 seconds.
s = 31415
alternative approach:
Elapsed time is 0.048084 seconds.
s = 31415
Compute all pair-wise distances with pdist2 and then count pairs with zero distance. If the coordinates are float values, you may want to use a tolerance instead of comparing against zero:
%// Data:
M = 10;
N = 8;
listM = randi(10,M,2)-1;
listN = randi(10,N,2)-1;
tol = 1e-6;
%// Distance matrix:
d = pdist2(listM, listN);
%// Count:
count = sum(d(:)<tol);
This should work irrespective of the order of the points in each list, or their lengths. It is a hash-table/dictionary solution that should be fast but with memory demand linear with the lengths of the lists. Please, note that the syntax below may not be perfect, but a quick reference to the main data structures mentioned should make corrections trivial.
(1) populate a dictionary-like containers.Map, in a way that the key is a unique function of the points, e.g. num2str(M(i,1))'-'num2str(M(i,2)).
(2) Then, go over all elements of the second list, create the key just as in (1) and check if it exists. If it does, set map(key)=1 else set it to 0. In the end, all the keys consisting of common points will have 1s stored, and the rest will be zeros.
(3) Finalize by summing over the values of the map (something like sum(map.values())) which should give you the total number of unique intersections among the two sets, irrespective of the order these points appear in each list.
OBS: if you don't want to count just unique intersections but all repeated points, in (2), rather than making map(key)=1, add 1 to map(key). The rest is the same.
I'm slowly working my way though problem 23 in project Euler but I;ve run into a snag. Problem #23 involves trying to find the sum of all numbers that cannot be creat by two abundant numbers.
First here's my code:
function [divisors] = SOEdivisors(num)
%SOEDIVISORS This function finds the proper divisors of a number using the sieve
%of eratosthenes
%check for primality
if isprime(num) == 1
divisors = [1];
%if not prime find divisors
else
divisors = [0 2:num/2]; %hard code a zero at one.
for i = 2:num/2
if divisors(i) %if divisors i ~= 0
%if the remainder is not zero it is not a divisor
if rem(num, divisors(i)) ~= 0
%remove that number and all its multiples from the list
divisors(i:i:fix(num/2)) = 0;
end
end
end
%add 1 back and remove all zeros
divisors(1) = 1;
divisors = divisors(divisors ~= 0);
end
end
This function finds abundant numbers
function [abundantvecfinal] = abundantnum(limitnum)
%ABUNDANTNUM creates a vector of abundant numbers up to a limit.
abundantvec = [];
%abundant number count
count = 1;
%test for abundance
for i = 1:limitnum
%find divisors
divisors = SOEdivisors(i);
%if sum of divisors is greater than number it is abundant, add it to
%vector
if i < sum(divisors)
abundantvec(count) = i;
count = count + 1;
end
end
abundantvecfinal = abundantvec;
end
And this is the main script
%This finds the sum of all numbers that cannot be written as the sum of two
%abundant numbers and under 28123
%get abundant numbers
abundant = abundantnum(28153);
%total non abundant numbers
total = 0;
%sums
sums = [];
%count moves through the newsums vector allowing for a new space for each
%new sum
count = 1;
%get complete list of possible sums under using abundant numbers under
%28123 then store them in a vector
for i = 1:length(abundant)
for x = 1:length(abundant)
%make sure it is not the same number being added to itself
if i ~= x
sums(count) = abundant(i) + abundant(x);
count = count + 1;
end
end
end
%setdiff function compares two vectors and removes all similar elements
total = sum(setdiff(1:28153, sums));
disp(total)
The first problem is it gives me the wrong answer. I know that I'm getting the correct proper divisors and the correct abundant numbers so the problem probably lies in the main script. And it seems as though it almost certainly lies in the creation of the abundant sums. I was hoping someone might be able to find an error I havent been able to.
Beyond that, the code is slow due to the multiple for loops, so I'm also looking for ways to do problems like this more efficiently.
Thanks!
Well, I don't have enough reputation to just comment. Why are you ruling out adding the same number to itself? The problem statement gives the example 12+12=24.
I also don't see a reason that x should ever be less than i. You don't need to sum the same two numbers twice.
The heading might be slightly confusing, but what I want to do is the following:
I have function inputs x,t, outputs y (i.e y = f(x,t)), and a set of ranges xr, tr and I want to do
v = zeros(1,length(xr)-1)
for kk=1:(length(xr)-1)
ix = x >= xr(kk) & x < xr(kk+1) & t >= tr(kk) & t < tr(kk+1)
v(kk) = sum(y(ix));
end
This is very slow, while histc, which does almost the same (except it sums the number of entries in the interval instead of the function output) is very fast. How can this be implemented faster? I tried using arrayfun, but this only gave a 25% increas in speed.
Thanks,
If you use histc with two output arguments, the second output will give you the bin numbers for each data entry. You can use the bin numbers to sum up the entries belonging to each bin, for instance, using bsxfun or accumarray.
[val, id] = histc(x, xr);
v = accumarray(id(:), y(:));