Matlab: fast way to sum ones in binary numbers with Sparse structure? - matlab

Most answers only address the already-answered question about Hamming weights but ignore the point about find and dealing with the sparsity. Apparently the answer by Shai here addresses the point about find -- but I am not yet able to verify it. My answer here does not utilise the ingenuity of other answers such as the bitshifting but good enough example answer.
Input
>> mlf=sparse([],[],[],2^31+1,1);mlf(1)=10;mlf(10)=111;mlf(77)=1010;
>> transpose(dec2bin(find(mlf)))
ans =
001
000
000
011
001
010
101
Goal
1
0
0
2
1
1
2
Fast calculation for the amount of ones in binary numbers with the sparse structure?

You can do this in tons of ways. The simplest I think would be
% Example data
F = [268469248 285213696 536904704 553649152];
% Solution 1
sum(dec2bin(F)-'0',2)
And the fastest (as found here):
% Solution 2
w = uint32(F');
p1 = uint32(1431655765);
p2 = uint32(858993459);
p3 = uint32(252645135);
p4 = uint32(16711935);
p5 = uint32(65535);
w = bitand(bitshift(w, -1), p1) + bitand(w, p1);
w = bitand(bitshift(w, -2), p2) + bitand(w, p2);
w = bitand(bitshift(w, -4), p3) + bitand(w, p3);
w = bitand(bitshift(w, -8), p4) + bitand(w, p4);
w = bitand(bitshift(w,-16), p5) + bitand(w, p5);

According to your comments, you convert a vector of numbers to binary string representations using dec2bin. Then you can achieve what you want as follows, where I'm using vector [10 11 12] as an example:
>> sum(dec2bin([10 11 12])=='1',2)
ans =
2
3
2
Or equivalently,
>> sum(dec2bin([10 11 12])-'0',2)
For speed, you could avoid dec2bin like this (uses modulo-2 operations, inspired in dec2bin code):
>> sum(rem(floor(bsxfun(#times, [10 11 12].', pow2(1-N:0))),2),2)
ans =
2
3
2
where N is the maximum number of binary digits you expect.

If you really want fast, I think a look-up-table would be handy. You can simply map, for 0..255 how many ones they have. Do this once, and then you only need to decompose an int to its bytes look the sum up in the table and add the results - no need to go to strings...
An example:
>> LUT = sum(dec2bin(0:255)-'0',2); % construct the look up table (only once)
>> ii = uint32( find( mlf ) ); % get the numbers
>> vals = LUT( mod( ii, 256 ) + 1 ) + ... % lower bytes
LUT( mod( ii/256, 256 ) + 1 ) + ...
LUT( mod( ii/65536, 256 ) + 1 ) + ...
LUT( mod( ii/16777216, 256 ) + 1 );
Using typecast (as suggested by Amro):
>> vals = sum( reshape(LUT(double(typecast(ii,'uint8'))+1), 4, [] ), 1 )';
Run time comparison
>> ii = uint32(randi(intmax('uint32'),100000,1));
>> tic; vals1 = sum( reshape(LUT(typecast(ii,'uint8')+1), 4, [] ), 1 )'; toc, %//'
>> tic; vals2 = sum(dec2bin(ii)-'0',2); toc
>> dii = double(ii); % type issues
>> tic; vals3 = sum(rem(floor(bsxfun(#times, dii, pow2(1-32:0))),2),2); toc
Results:
Elapsed time is 0.006144 seconds. <-- this answer
Elapsed time is 0.120216 seconds. <-- using dec2bin
Elapsed time is 0.118009 seconds. <-- using rem and bsxfun

Here is an example to show #Shai's idea of using a lookup table:
% build lookup table for 8-bit integers
lut = sum(dec2bin(0:255)-'0', 2);
% get indices
idx = find(mlf);
% break indices into 8-bit integers and apply LUT
nbits = lut(double(typecast(uint32(idx),'uint8')) + 1);
% sum number of bits in each
s = sum(reshape(nbits,4,[]))
you might have to switch to uint64 instead if you have really large sparse arrays with large indices outside the 32-bit range..
EDIT:
Here is another solution for you using Java:
idx = find(mlf);
s = arrayfun(#java.lang.Integer.bitCount, idx);
EDIT#2:
Here is yet another solution implemented as C++ MEX function. It relies on std::bitset::count:
bitset_count.cpp
#include "mex.h"
#include <bitset>
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
// validate input/output arguments
if (nrhs != 1) {
mexErrMsgTxt("One input argument required.");
}
if (!mxIsUint32(prhs[0]) || mxIsComplex(prhs[0]) || mxIsSparse(prhs[0])) {
mexErrMsgTxt("Input must be a 32-bit integer dense matrix.");
}
if (nlhs > 1) {
mexErrMsgTxt("Too many output arguments.");
}
// create output array
mwSize N = mxGetNumberOfElements(prhs[0]);
plhs[0] = mxCreateDoubleMatrix(N, 1, mxREAL);
// get pointers to data
double *counts = mxGetPr(plhs[0]);
uint32_T *idx = reinterpret_cast<uint32_T*>(mxGetData(prhs[0]));
// count bits set for each 32-bit integer number
for(mwSize i=0; i<N; i++) {
std::bitset<32> bs(idx[i]);
counts[i] = bs.count();
}
}
Compile the above function as mex -largeArrayDims bitset_count.cpp, then run it as usual:
idx = find(mlf);
s = bitset_count(uint32(idx))
I decided to compare all the solutions mentioned so far:
function [t,v] = testBitsetCount()
% random data (uint32 vector)
x = randi(intmax('uint32'), [1e5,1], 'uint32');
% build lookup table (done once)
LUT = sum(dec2bin(0:255,8)-'0', 2);
% functions to compare
f = {
#() bit_twiddling(x) % bit twiddling method
#() lookup_table(x,LUT); % lookup table method
#() bitset_count(x); % MEX-function (std::bitset::count)
#() dec_to_bin(x); % dec2bin
#() java_bitcount(x); % Java Integer.bitCount
};
% compare timings and check results are valid
t = cellfun(#timeit, f, 'UniformOutput',true);
v = cellfun(#feval, f, 'UniformOutput',false);
assert(isequal(v{:}));
end
function s = lookup_table(x,LUT)
s = sum(reshape(LUT(double(typecast(x,'uint8'))+1),4,[]))';
end
function s = dec_to_bin(x)
s = sum(dec2bin(x,32)-'0', 2);
end
function s = java_bitcount(x)
s = arrayfun(#java.lang.Integer.bitCount, x);
end
function s = bit_twiddling(x)
p1 = uint32(1431655765);
p2 = uint32(858993459);
p3 = uint32(252645135);
p4 = uint32(16711935);
p5 = uint32(65535);
s = x;
s = bitand(bitshift(s, -1), p1) + bitand(s, p1);
s = bitand(bitshift(s, -2), p2) + bitand(s, p2);
s = bitand(bitshift(s, -4), p3) + bitand(s, p3);
s = bitand(bitshift(s, -8), p4) + bitand(s, p4);
s = bitand(bitshift(s,-16), p5) + bitand(s, p5);
end
The times elapsed in seconds:
t =
0.0009 % bit twiddling method
0.0087 % lookup table method
0.0134 % C++ std::bitset::count
0.1946 % MATLAB dec2bin
0.2343 % Java Integer.bitCount

This gives you the rowsums of the binary numbers from the sparse structure.
>> mlf=sparse([],[],[],2^31+1,1);mlf(1)=10;mlf(10)=111;mlf(77)=1010;
>> transpose(dec2bin(find(mlf)))
ans =
001
000
000
011
001
010
101
>> sum(ismember(transpose(dec2bin(find(mlf))),'1'),2)
ans =
1
0
0
2
1
1
2
Hope someone able to find faster rowsummation!

Mex it!
Save this code as countTransBits.cpp:
#include "mex.h"
void mexFunction( int nout, mxArray* pout[], int nin, mxArray* pin[] ) {
mxAssert( nin == 1 && mxIsSparse(pin[0]) && mxGetN( pin[0] ) == 1,
"expecting single sparse column vector input" );
mxAssert( nout == 1, "expecting single output" );
// set output, assuming 32 bits, set to 64 if needed
pout[0] = mxCreateNumericMatrix( 32, 1, mxUINT32_CLASS, mxREAL );
unsigned int* counter = (unsigned int*)mxGetData( pout[0] );
for ( int i = 0; i < 32; i++ ) {
counter[i] = 0;
}
// start working
mwIndex *pIr = mxGetIr( pin[0] );
mwIndex* pJc = mxGetJc( pin[0] );
double* pr = mxGetPr( pin[0] );
for ( mwSize i = pJc[0]; i < pJc[1]; i++ ) {
if ( pr[i] != 0 ) {// make sure entry is non-zero
unsigned int entry = pIr[i] + 1; // cast to unsigned int and add 1 for 1-based indexing in Matlab
int bit = 0;
while ( entry != 0 && bit < 32 ) {
counter[bit] += ( entry & 0x1 ); // count the lsb
bit++;
entry >>= 1; // shift right
}
}
}
}
Compile it in Matlab
>> mex -largeArrayDims -O countTransBits.cpp
Run the code
>> countTransBits( mlf )
Note that the output count in 32 bins lsb to msb.

The bitcount FEX contribution offers a solution based on the lookup table approach, but is better optimized. It runs more than twice as fast as the bit twiddling method (i.e. the fastest pure-MATLAB method reported by Amro) over a 1 million uint32 vector, using R2015a on my old laptop.

Related

Matlab SHA-1 custom implementation doesn't give the right result

I trying to have a sha1 algorithm in Matlab.
I know I can use System.Security.Cryptography.HashAlgorithm.Create('SHA1');, but that relies on .NET, which I'd like to avoid.
I also found this thread, which suggest to use MessageDigest.getInstance("SHA-1"), but the output is not good.
I haven't found any other portable way.
Here's my code based on Wikipedia pseudo code, the result is also off. The only solution that gives result similar to online SHA1 is the .Net function.
Can somebody see the error in my function?
function [hh] = sha1(bytes_in)
% Note 1: All variables are unsigned 32-bit quantities and wrap modulo 232 when calculating, except for
% ml, the message length, which is a 64-bit quantity, and
% hh, the message digest, which is a 160-bit quantity.
% Note 2: All constants in this pseudo code are in big endian.
% Within each word, the most significant byte is stored in the leftmost byte position
%
% Initialize variables:
bytes_in = squeeze(uint8(bytes_in));
bytes_in = reshape(bytes_in, length(bytes_in), 1);
h0 = uint32(0x67452301);
h1 = uint32(0xEFCDAB89);
h2 = uint32(0x98BADCFE);
h3 = uint32(0x10325476);
h4 = uint32(0xC3D2E1F0);
% Pre-processing:
% append the bit '1' to the message e.g. by adding 0x80 if message length is a multiple of 8 bits.
% append 0 ? k < 512 bits '0', such that the resulting message length in bits
% is congruent to ?64 ? 448 (mod 512)
% append ml, the original message length in bits, as a 64-bit big-endian integer.
% Thus, the total length is a multiple of 512 bits.
%
message_len64 = uint64(length(bytes_in));
messages_len_bytes = zeros(8, 1);
for i=1:8
messages_len_bytes(i) = uint8(bitshift(message_len64, -64+i*8));
end
bytes_in = [bytes_in; 0x80];
padlen = 64-8 - mod(length(bytes_in), 64);
bytes_in = [bytes_in; zeros(padlen,1);messages_len_bytes];
assert(mod(length(bytes_in), 64) == 0);
chunk_count = length(bytes_in)/64;
% Process the message in successive 512-bit chunks:
% break message into 512-bit chunks
for i=1:chunk_count
chunk = bytes_in( ((i-1)*64+1):(i*64));
assert(length(chunk) == 64);
% Break chunk into sixteen 32-bit big-endian words w[i], 0 ? i ? 15
w = uint32(zeros(80,1));
for j=0:15
p1 = bitshift(uint32(chunk(j*4+1)),24);
p2 = bitshift(uint32(chunk(j*4+2)),16);
p3 = bitshift(uint32(chunk(j*4+3)),8);
p4 = bitshift(uint32(chunk(j*4+4)),0);
w(j+1) = p1 + p2 + p3 + p4;
end
% Message schedule: extend the sixteen 32-bit words into eighty 32-bit words:
for j=17:80
temp = bitxor(bitxor(bitxor(w(j-3),w(j-8)), w(j-14)), w(j-16));
w(j) = leftrotate32(temp, 1);
end
% Initialize hash value for this chunk:
a = h0;
b = h1;
c = h2;
d = h3;
e = h4;
for j=1:80
if j >= 1 && j <= 20
f = bitor(bitand(b,c), bitand(bitcmp(b), d));
k = 0x5A827999;
elseif j >= 21 && j <= 40
f = bitxor(bitxor(b,c), d);
k = 0x6ED9EBA1;
elseif j >= 41 && j <= 60
f = bitor(bitor(bitand(b, c),bitand(b, d)), bitand(c, d)) ;
k = 0x8F1BBCDC;
elseif j >= 61 && j <= 80
f = bitxor(bitxor(b,c), d);
k = 0xCA62C1D6;
end
temp = uint64(leftrotate32(a,5)) + uint64(f) + uint64(e) + uint64(k) + uint64(w(j));
temp = uint32(bitand(temp, uint64(0xFFFFFFFF)));
e = d;
d = c;
c = leftrotate32(b, 30);
b = a;
a = temp;
end
% Add this chunk's hash to result so far:
h0 = uint32(bitand(uint64(h0) + uint64(a), uint64(0xFFFFFFFF)));
h1 = uint32(bitand(uint64(h1) + uint64(b), uint64(0xFFFFFFFF)));
h2 = uint32(bitand(uint64(h2) + uint64(c), uint64(0xFFFFFFFF)));
h3 = uint32(bitand(uint64(h3) + uint64(d), uint64(0xFFFFFFFF)));
h4 = uint32(bitand(uint64(h4) + uint64(e), uint64(0xFFFFFFFF)));
% Produce the final hash value (big-endian) as a 160-bit number:
hh = [dec2hex(h0, 8), dec2hex(h1, 8), dec2hex(h2, 8), dec2hex(h3, 8), dec2hex(h4, 8)];
assert(length(hh) == 160/8*2)
end
end
function vout = leftrotate32(v32, v)
vout = uint32(bin2dec(circshift(dec2bin(v32, 32), -v)));
end

Accelerate Matlab nested for loop with bsxfun

I have a graph n x n graph W described as its adjacency matrix and a n vector of group labels (integers) of every node.
I need to count the number of links (edges) between nodes in group c and nodes in group d for every pair of groups. Do to this I wrote a nested for loop but I'm sure that this is not the fastest way to compute the matrix that in the code I call mcd, i.e. the matrix that counts the number of edges betweeen group c and d.
Is it possible through the bsxfun to make this operation faster?
function mcd = interlinks(W,ci)
%// W is the adjacency matrix of a simple undirected graph
%// ci are the group labels of every node in the graph, can be from 1 to |C|
n = length(W); %// number of nodes in the graph
m = sum(nonzeros(triu(W))); %// number of edges in the graph
ncomms = length(unique(ci)); %// number of groups of ci
mcd = zeros(ncomms); %// this is the matrix that counts the number of edges between group c and group d, twice the number of it if c==d
for c=1:ncomms
nodesc = find(ci==c); %// nodes in group c
for d=1:ncomms
nodesd = find(ci==d); %// nodes in group d
M = W(nodesc,nodesd); %// submatrix of edges between c and d
mcd(c,d) = sum(sum(M)); %// count of edges between c and d
end
end
%// Divide diagonal half because counted twice
mcd(1:ncomms+1:ncomms*ncomms)=mcd(1:ncomms+1:ncomms*ncomms)/2;
For example in the picture here the adjacency matrix is
W=[0 1 1 0 0 0;
1 0 1 1 0 0;
1 1 0 0 1 1;
0 1 0 0 1 0;
0 0 1 1 0 1;
0 0 1 0 1 0];
the group label vector is ci=[ 1 1 1 2 2 3] and the resulting matrix mcd is:
mcd=[3 2 1;
2 1 1;
1 1 0];
It means for example that group 1 has 3 links with itself, 2 links with group 2 and 1 link with group 3.
How about this?
C = bsxfun(#eq, ci,unique(ci)');
mcd = C*W*C'
mcd(logical(eye(size(mcd)))) = mcd(logical(eye(size(mcd))))./2;
I think it is what you wanted.
IIUC and assuming ci as an sorted array, it seems you are basically doing blockwise summations, but with irregular block sizes. Thus, you can use an approach using cumsum along the rows and columns and then differentiating at the shift positions in ci, which will basically give you blockwise summations.
The implementation would look like this -
%// Get cumulative sums row-wise and column-wise
csums = cumsum(cumsum(W,1),2)
%/ Get IDs of shifts and thus get cumsums at those positions
[~,idx] = unique(ci) %// OR find(diff([ci numel(ci)]))
csums_indexed = csums(idx,idx)
%// Get the blockwise summations by differentiations on csums at shifts
col1 = diff(csums_indexed(:,1),[],1)
row1 = diff(csums_indexed(1,:),[],2)
rest2D = diff(diff(csums_indexed,[],2),[],1)
out = [[csums_indexed(1,1) ; col1] [row1 ; rest2D]]
If you're not opposed to a mex function, you can use my code below.
testing code
n = 2000;
n_labels = 800;
W = rand(n, n);
W = W * W' > .5; % generate symmetric adjacency matrix of logicals
Wd = double(W);
ci = floor(rand(n, 1) * n_labels ) + 1; % generate ids from 1 to 251
[C, IA, IC] = unique(ci);
disp(sprintf('base avg fun time = %g ',timeit(#() interlinks(W, IC))));
disp(sprintf('mex avg fun time = %g ',timeit(#() interlink_mex(W, IC))));
%note this function requires symmetric (function from #aarbelle)
disp(sprintf('bsx avg fun time = %g ',timeit(#() interlinks_bsx(Wd, IC'))));
x1 = interlinks(W, IC);
x2 = interlink_mex(W, IC);
x3 = interlinks_bsx(Wd, IC');
disp(sprintf('norm(x1 - x2) = %g', norm(x1 - x2)));
disp(sprintf('norm(x1 - x3) = %g', norm(x1 - x3)));
testing results
Testing results with these settings:
base avg fun time = 4.94275
mex avg fun time = 0.0373092
bsx avg fun time = 0.126406
norm(x1 - x2) = 0
norm(x1 - x3) = 0
Basically, for small n_labels, the bsx function does very well but you can make it large enough so that the mex function is faster.
c++ code
throw it into some file like interlink_mex.cpp and compile with mex interlink_mex.cpp. You need a c++ compiler on your machine etc...
#include "mex.h"
#include "matrix.h"
#include <math.h>
// Author: Matthew Gunn
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
if(nrhs != 2)
mexErrMsgTxt("Invalid number of inputs. Shoudl be 2 input argument.");
if(nlhs != 1)
mexErrMsgTxt("Invalid number of outputs. Should be 1 output arguments.");
if(!mxIsLogical(prhs[0])) {
mexErrMsgTxt("First argument should be a logical array (i.e. type logical)");
}
if(!mxIsDouble(prhs[1])) {
mexErrMsgTxt("Second argument should be an array of type double");
}
const mxArray *W = prhs[0];
const mxArray *ci = prhs[1];
size_t W_m = mxGetM(W);
size_t W_n = mxGetN(W);
if(W_m != W_n)
mexErrMsgTxt("Rows and columns of W are not equal");
// size_t ci_m = mxGetM(ci);
size_t ci_n = mxGetNumberOfElements(ci);
mxLogical *W_data = mxGetLogicals(W);
// double *W_data = mxGetPr(W);
double *ci_data = mxGetPr(ci);
size_t *ci_data_size_t = (size_t*) mxCalloc(ci_n, sizeof(size_t));
size_t ncomms = 0;
double intpart;
for(size_t i = 0; i < ci_n; i++) {
double x = ci_data[i];
if(x < 1 || x > 65536 || modf(x, &intpart) != 0.0) {
mexErrMsgTxt("Input ci is not all integers from 1 to a maximum value of 65536 (can edit source code to change this)");
}
size_t xx = (size_t) x;
if(xx > ncomms)
ncomms = xx;
ci_data_size_t[i] = xx - 1;
}
mxArray *mcd = mxCreateDoubleMatrix(ncomms, ncomms, mxREAL);
double *mcd_data = mxGetPr(mcd);
for(size_t i = 0; i < W_n; i++) {
size_t ii = ci_data_size_t[i];
for(size_t j = 0; j < W_n; j++) {
size_t jj = ci_data_size_t[j];
mcd_data[ii + jj * ncomms] += (W_data[i + j * W_m] != 0);
}
}
for(size_t i = 0; i < ncomms * ncomms; i+= ncomms + 1) //go along diagonal
mcd_data[i]/=2; //divide by 2
mxFree(ci_data_size_t);
plhs[0] = mcd;
}

Create faster Fibonacci function for n > 100 in MATLAB / octave

I have a function that tells me the nth number in a Fibonacci sequence. The problem is it becomes very slow when trying to find larger numbers in the Fibonacci sequence does anyone know how I can fix this?
function f = rtfib(n)
if (n==1)
f= 1;
elseif (n == 2)
f = 2;
else
f =rtfib(n-1) + rtfib(n-2);
end
The Results,
tic; rtfib(20), toc
ans = 10946
Elapsed time is 0.134947 seconds.
tic; rtfib(30), toc
ans = 1346269
Elapsed time is 16.6724 seconds.
I can't even get a value after 5 mins doing rtfib(100)
PS: I'm using octave 3.8.1
If time is important (not programming techniques):
function f = fib(n)
if (n == 1)
f = 1;
elseif (n == 2)
f = 2;
else
fOld = 2;
fOlder = 1;
for i = 3 : n
f = fOld + fOlder;
fOlder = fOld;
fOld = f;
end
end
end
tic;fib(40);toc; ans = 165580141; Elapsed time is 0.000086 seconds.
You could even use uint64. n = 92 is the most you can get from uint64:
tic;fib(92);toc; ans = 12200160415121876738; Elapsed time is 0.001409 seconds.
Because,
fib(93) = 19740274219868223167 > intmax('uint64') = 18446744073709551615
Edit
In order to get fib(n) up to n = 183, It is possible to use two uint64 as one number,
with a special function for summation,
function [] = fib(n)
fL = uint64(0);
fH = uint64(0);
MaxNum = uint64(1e19);
if (n == 1)
fL = 1;
elseif (n == 2)
fL = 2;
else
fOldH = uint64(0);
fOlderH = uint64(0);
fOldL = uint64(2);
fOlderL = uint64(1);
for i = 3 : n
[fL q] = LongSum (fOldL , fOlderL , MaxNum);
fH = fOldH + fOlderH + q;
fOlderL = fOldL;
fOlderH = fOldH;
fOldL = fL;
fOldH = fH;
end
end
sprintf('%u',fH,fL)
end
LongSum is:
function [s q] = LongSum (a, b, MaxNum)
if a + b >= MaxNum
q = 1;
if a >= MaxNum
s = a - MaxNum;
s = s + b;
elseif b >= MaxNum
s = b - MaxNum;
s = s + a;
else
s = MaxNum - a;
s = b - s;
end
else
q = 0;
s = a + b;
end
Note some complications in LongSum might seem unnecessary, but they are not!
(All the deal with inner if is that I wanted to avoid s = a + b - MaxNum in one command, because it might overflow and store an irrelevant number in s)
Results
tic;fib(159);toc; Elapsed time is 0.009631 seconds.
ans = 1226132595394188293000174702095995
tic;fib(183);toc; Elapsed time is 0.009735 seconds.
fib(183) = 127127879743834334146972278486287885163
However, you have to be careful about sprintf.
I also did it with three uint64, and I could get up to,
tic;fib(274);toc; Elapsed time is 0.032249 seconds.
ans = 1324695516964754142521850507284930515811378128425638237225
(It's pretty much the same code, but I could share it if you are interested).
Note that we have fib(1) = 1 , fib(2) = 2according to question, while it is more common with fib(1) = 1 , fib(2) = 1, first 300 fibs are listed here (thanks to #Rick T).
Seems like fibonaacci series follows the golden ratio, as talked about in some detail here.
This was used in this MATLAB File-exchange code and I am writing here, just the esssence of it -
sqrt5 = sqrt(5);
alpha = (1 + sqrt5)/2; %// alpha = 1.618... is the golden ratio
fibs = round( alpha.^n ./ sqrt5 )
You can feed an integer into n for the nth number in Fibonacci Series or feed an array 1:n to have the whole series.
Please note that this method holds good till n = 69 only.
If you have access to the Symbolic Math Toolbox in MATLAB, you could always just call the Fibonacci function from MuPAD:
>> fib = #(n) evalin(symengine, ['numlib::fibonacci(' num2str(n) ')'])
>> fib(274)
ans =
818706854228831001753880637535093596811413714795418360007
It is pretty fast:
>> timeit(#() fib(274))
ans =
0.0011
Plus you can you go for as large numbers as you want (limited only by how much RAM you have!), it is still blazing fast:
% see if you can beat that!
>> tic
>> x = fib(100000);
>> toc % Elapsed time is 0.004621 seconds.
% result has more than 20 thousand digits!
>> length(char(x)) % 20899
Here is the full value of fib(100000): http://pastebin.com/f6KPGKBg
To reach large numbers you can use symbolic computation. The following works in Matlab R2010b.
syms x y %// declare variables
z = x + y; %// define formula
xval = '0'; %// initiallize x, y values
yval = '1';
for n = 2:300
zval = subs(z, [x y], {xval yval}); %// update z value
disp(['Iteration ' num2str(n) ':'])
disp(zval)
xval = yval; %// shift values
yval = zval;
end
You can do it in O(log n) time with matrix exponentiation:
X = [0 1
1 1]
X^n will give you the nth fibonacci number in the lower right-hand corner; X^n can be represented as the product of several matrices X^(2^i), so for example X^11 would be X^1 * X^2 * X^8, i <= log_2(n). And X^8 = (X^4)^2, etc, so at most 2*log(n) matrix multiplications.
One performance issue is that you use a recursive solution. Going for an iterative method will spare you of the argument passing for each function call. As Olivier pointed out, it will reduce the complexity to linear.
You can also look here. Apparently there's a formula that computes the n'th member of the Fibonacci sequence. I tested it for up to 50'th element. For higher n values it's not very accurate.
The implementation of a fast Fibonacci computation in Python could be as follows. I know this is Python not MATLAB/Octave, however it might be helpful.
Basically, rather than calling the same Fibonacci function over and over again with O(2n), we are storing Fibonacci sequence on a list/array with O(n):
#!/usr/bin/env python3.5
class Fib:
def __init__(self,n):
self.n=n
self.fibList=[None]*(self.n+1)
self.populateFibList()
def populateFibList(self):
for i in range(len(self.fibList)):
if i==0:
self.fibList[i]=0
if i==1:
self.fibList[i]=1
if i>1:
self.fibList[i]=self.fibList[i-1]+self.fibList[i-2]
def getFib(self):
print('Fibonacci sequence up to ', self.n, ' is:')
for i in range(len(self.fibList)):
print(i, ' : ', self.fibList[i])
return self.fibList[self.n]
def isNonnegativeInt(value):
try:
if int(value)>=0:#throws an exception if non-convertible to int: returns False
return True
else:
return False
except:
return False
n=input('Please enter a non-negative integer: ')
while isNonnegativeInt(n)==False:
n=input('A non-negative integer is needed: ')
n=int(n) # convert string to int
print('We are using ', n, 'based on what you entered')
print('Fibonacci result is ', Fib(n).getFib())
Output for n=12 would be like:
I tested the runtime for n=100, 300, 1000 and the code is really fast, I don't even have to wait for the output.
One simple way to speed up the recursive implementation of a Fibonacci function is to realize that, substituting f(n-1) by its definition,
f(n) = f(n-1) + f(n-2)
= f(n-2) + f(n-3) + f(n-2)
= 2*f(n-2) + f(n-3)
This simple transformation greatly reduces the number of steps taken to compute a number in the series.
If we start with OP's code, slightly corrected:
function result = fibonacci(n)
switch n
case 0
result = 0;
case 1
result = 1;
case 2
result = 1;
case 3
result = 2;
otherwise
result = fibonacci(n-2) + fibonacci(n-1);
end
And apply our transformation:
function result = fibonacci_fast(n)
switch n
case 0
result = 0;
case 1
result = 1;
case 2
result = 1;
case 3
result = 2;
otherwise
result = fibonacci_fast(n-3) + 2*fibonacci_fast(n-2);
end
Then we see a 30x speed improvement for computing the 20th number in the series (using Octave):
>> tic; for ii=1:100, fibonacci(20); end; toc
Elapsed time is 12.4393 seconds.
>> tic; for ii=1:100, fibonacci_fast(20); end; toc
Elapsed time is 0.448623 seconds.
Of course Rashid's non-recursive implementation is another 60x faster still: 0.00706792 seconds.

Matlab FFT and home brewed FFT

I'm trying to verify an FFT algorithm I should use for a project VS the same thing on Matlab.
The point is that with my own C FFT function I always get the right (the second one) part of the double sided FFT spectrum evaluated in Matlab and not the first one as "expected".
For instance if my third bin is in the form a+i*b the third bin of Matlab's FFT is a-i*b. A and b values are the same but i always get the complex conjugate of Matlab's.
I know that in terms of amplitudes and power there's no trouble (cause abs value) but I wonder if in terms of phases I'm going to read always wrong angles.
Im not so skilled in Matlab to know (and I have not found useful infos on the web) if Matlab FFT maybe returns the FFT spectre with negative frequencies first and then positive... or if I have to fix my FFT algorithm... or if it is all ok because phases are the unchanged regardless wich part of FFT we choose as single side spectrum (but i doubt about this last option).
Example:
If S is the sample array with N=512 samples, Y = fft(S) in Matlab return the FFT as (the sign of the imaginary part in the first half of the array are random, just to show the complex conjugate difference for the second part):
1 A1 + i*B1 (DC, B1 is always zero)
2 A2 + i*B2
3 A3 - i*B3
4 A4 + i*B4
5 A5 + i*B5
...
253 A253 - i*B253
254 A254 + i*B254
255 A255 + i*B255
256 A256 + i*B256
257 A257 - i*B257 (Nyquyst, B257 is always zero)
258 A256 - i*B256
259 A255 - i*B255
260 A254 - i*B254
261 A253 + i*B253
...
509 A5 - i*B5
510 A4 - i*B4
511 A3 + i*B3
512 A2 - i*B2
My FFT implementation returns only 256 values (and that's ok) in the the Y array as:
1 1 A1 + i*B1 (A1 is the DC, B1 is Nyquist, both are pure Real numbers)
2 512 A2 - i*B2
3 511 A3 + i*B3
4 510 A4 - i*B4
5 509 A5 + i*B5
...
253 261 A253 + i*B253
254 260 A254 - i*B254
255 259 A255 - i*B255
256 258 A256 - i*B256
Where the first column is the proper index of my Y array and the second is just the reference of the relative row in the Matlab FFT implementation.
As you can see my FFT implementation (DC apart) returns the FFT like the second half of the Matlab's FFT (in reverse order).
To summarize: even if I use fftshift as suggested, it seems that my implementation always return what in the Matlab FFT should be considered the negative part of the spectrum.
Where is the error???
This is the code I use:
Note 1: the FFT array is not declared here and it is changed inside the function. Initially it holds the N samples (real values) and at the end it contains the N/2 +1 bins of the single sided FFT spectrum.
Note 2: the N/2+1 bins are stored in N/2 elements only because the DC component is always real (and it is stored in FFT[0]) and also the Nyquyst (and it is stored in FFT[1]), this exception apart all the other even elements K holds a real number and the oven elements K+1 holds the imaginary part.
void Fft::FastFourierTransform( bool inverseFft ) {
double twr, twi, twpr, twpi, twtemp, ttheta;
int i, i1, i2, i3, i4, c1, c2;
double h1r, h1i, h2r, h2i, wrs, wis;
int nn, ii, jj, n, mmax, m, j, istep, isign;
double wtemp, wr, wpr, wpi, wi;
double theta, tempr, tempi;
// NS is the number of samples and it must be a power of two
if( NS == 1 )
return;
if( !inverseFft ) {
ttheta = 2.0 * PI / NS;
c1 = 0.5;
c2 = -0.5;
}
else {
ttheta = 2.0 * PI / NS;
c1 = 0.5;
c2 = 0.5;
ttheta = -ttheta;
twpr = -2.0 * Pow( Sin( 0.5 * ttheta ), 2 );
twpi = Sin(ttheta);
twr = 1.0+twpr;
twi = twpi;
for( i = 2; i <= NS/4+1; i++ ) {
i1 = i+i-2;
i2 = i1+1;
i3 = NS+1-i2;
i4 = i3+1;
wrs = twr;
wis = twi;
h1r = c1*(FFT[i1]+FFT[i3]);
h1i = c1*(FFT[i2]-FFT[i4]);
h2r = -c2*(FFT[i2]+FFT[i4]);
h2i = c2*(FFT[i1]-FFT[i3]);
FFT[i1] = h1r+wrs*h2r-wis*h2i;
FFT[i2] = h1i+wrs*h2i+wis*h2r;
FFT[i3] = h1r-wrs*h2r+wis*h2i;
FFT[i4] = -h1i+wrs*h2i+wis*h2r;
twtemp = twr;
twr = twr*twpr-twi*twpi+twr;
twi = twi*twpr+twtemp*twpi+twi;
}
h1r = FFT[0];
FFT[0] = c1*(h1r+FFT[1]);
FFT[1] = c1*(h1r-FFT[1]);
}
if( inverseFft )
isign = -1;
else
isign = 1;
n = NS;
nn = NS/2;
j = 1;
for(ii = 1; ii <= nn; ii++) {
i = 2*ii-1;
if( j>i ) {
tempr = FFT[j-1];
tempi = FFT[j];
FFT[j-1] = FFT[i-1];
FFT[j] = FFT[i];
FFT[i-1] = tempr;
FFT[i] = tempi;
}
m = n/2;
while( m>=2 && j>m ) {
j = j-m;
m = m/2;
}
j = j+m;
}
mmax = 2;
while(n>mmax) {
istep = 2*mmax;
theta = 2.0 * PI /(isign*mmax);
wpr = -2.0 * Pow( Sin( 0.5 * theta ), 2 );
wpi = Sin(theta);
wr = 1.0;
wi = 0.0;
for(ii = 1; ii <= mmax/2; ii++) {
m = 2*ii-1;
for(jj = 0; jj <= (n-m)/istep; jj++) {
i = m+jj*istep;
j = i+mmax;
tempr = wr*FFT[j-1]-wi*FFT[j];
tempi = wr*FFT[j]+wi*FFT[j-1];
FFT[j-1] = FFT[i-1]-tempr;
FFT[j] = FFT[i]-tempi;
FFT[i-1] = FFT[i-1]+tempr;
FFT[i] = FFT[i]+tempi;
}
wtemp = wr;
wr = wr*wpr-wi*wpi+wr;
wi = wi*wpr+wtemp*wpi+wi;
}
mmax = istep;
}
if( inverseFft )
for(i = 1; i <= 2*nn; i++)
FFT[i-1] = FFT[i-1]/nn;
if( !inverseFft ) {
twpr = -2.0 * Pow( Sin( 0.5 * ttheta ), 2 );
twpi = Sin(ttheta);
twr = 1.0+twpr;
twi = twpi;
for(i = 2; i <= NS/4+1; i++) {
i1 = i+i-2;
i2 = i1+1;
i3 = NS+1-i2;
i4 = i3+1;
wrs = twr;
wis = twi;
h1r = c1*(FFT[i1]+FFT[i3]);
h1i = c1*(FFT[i2]-FFT[i4]);
h2r = -c2*(FFT[i2]+FFT[i4]);
h2i = c2*(FFT[i1]-FFT[i3]);
FFT[i1] = h1r+wrs*h2r-wis*h2i;
FFT[i2] = h1i+wrs*h2i+wis*h2r;
FFT[i3] = h1r-wrs*h2r+wis*h2i;
FFT[i4] = -h1i+wrs*h2i+wis*h2r;
twtemp = twr;
twr = twr*twpr-twi*twpi+twr;
twi = twi*twpr+twtemp*twpi+twi;
}
h1r = FFT[0];
FFT[0] = h1r+FFT[1]; // DC
FFT[1] = h1r-FFT[1]; // FS/2 (NYQUIST)
}
return;
}
In matlab try using fftshift(fft(...)). Matlab doesn't automatically shift the spectrum after the FFT is called which is why they implemented the fftshift() function.
It is simply a matlab formatting thing. Basically, matlab arrange Fourier transform in following order
DC, (DC-1), .... (Nyquist-1), -Nyquist, -Nyquist+1, ..., DC-1
Let's say you have a 8 point sequence: [1 2 3 1 4 5 1 3]
In your signal processing class, your professor probably draws the Fourier spectrum based on a Cartesian system ( negative -> positive for x axis); So your DC should be located at 0 (the 4th position in your fft sequence, assuming position index here is 0-based) on your x axis.
In matlab, the DC is the very first element in the fft sequence, so you need to to fftshit() to swap the first half and second half of the fft sequence such that DC will be located at 4th position (position is 0-based indexed)
I am attaching a graph here so you may have a visual:
where a is the original 8-point sequence; FT(a) is the Fourier transform of a.
The matlab code is here:
a = [1 2 3 1 4 5 1 3];
A = fft(a);
N = length(a);
x = -N/2:N/2-1;
figure
subplot(3,1,1), stem(x, a,'o'); title('a'); xlabel('time')
subplot(3,1,2), stem(x, fftshift(abs(A),2),'o'); title('FT(a) in signal processing'); xlabel('frequency')
subplot(3,1,3), stem(x, abs(A),'o'); title('FT(a) in matlab'); xlabel('frequency')

matlab: convert a string of hex values to a decimal value?

I wrote functions to convert 100,000 hex strings to values, but it takes 10 seconds to perform on the whole array. Does Matlab have a function to do this, so that it is faster, ... ie: less than 1 second for the array?
function x = hexstring2dec(s)
[m n] = size(s);
x = zeros(1, m);
for i = 1 : m
for j = n : -1 : 1
x(i) = x(i) + hexchar2dec(s(i, j)) * 16 ^ (n - j);
end
end
function x = hexchar2dec(c)
if c >= 48 && c <= 57
x = c - 48;
elseif c >= 65 && c <= 70
x = c - 55;
elseif c >= 97 && c <= 102
x = c - 87;
end
Try using hex2dec. It should be faster much faster than looping over each character.
shoelzer's answer is obviously the best.
However, if you want to do the conversion by yourself, then you might find this useful:
Assuming s is a char matrix: all hex numbers are of the same length (zero padded if necessary) and each row has a single number. Then
ds = double( upper(s) ); % convert to double
sel = ds >= double('A'); % select A-F
ds( sel ) = ds( sel ) - double('A') + 10; % convert to 10 - 15
ds(~sel) = ds(~sel) - double('0'); % convert 0-9
% do the sum through vector product
v = 16.^( (size(s,2)-1):-1:0 );
x = s * v(:);