Related
I'm doing some analysis where I'm analysing hundreds of data files, which are being analysed iteratively. Here is an examples of the sort of data that I have:
start_time = datenum('1990-01-01');
end_time = datenum('2009-12-31');
time = start_time:end_time;
datx = rand(length(time),1);
daty = datx-2;
where I have a time variable and two data variables.
After loading the data I then need to pass the data through a function. However, I need to do this by including firstly the data from year 1 only, then from years 1 to 2; 1 to 3, 1 to 4 and so on until I pass the data through the function for the entire series. This can be performed with a loop with the following:
% split into different years
datev = datevec(time);
iyear = datev(:,1);
unique_year = unique(iyear);
for k = 1:length(unique_year);
idx = find(iyear >= unique_year(1) & iyear <= unique_year(k));
% select data for year
d_time = time(idx);
d_datx = datx(idx);
d_daty = daty(idx);
% now select individual years from this subset
datev2 = datevec(d_time);
iyear2 = datev2(:,1);
unique_year2 = unique(iyear2);
for k2 = 1:length(unique_year2);
idx2 = find(iyear2 == unique_year2(k2));
% select data for year
d_time2 = d_time(idx2);
d_datx2 = d_datx(idx2);
d_daty2 = d_daty(idx2);
% pass through some function
mae_out = some_function(d_datx2, d_daty2);
mae(k2) = mae_out;
end
mean_mae(k) = mean(mae);
end
function mae = some_function(datx, daty)
mae = mean(abs(datx - daty));
end
Note here that I'm using a very simple function as an example, and the actual function is more complex.
Having two loops like this takes a long time to run on my actual data. Is there a better/faster way that I can perform the above, possibly without loops?
If you record the previous result, you do not need the inner loop. You are currently computing a total of (20+21)/2 = 210 iterations, but you only need to compute 20. The key here is that mean(a(1:k)) == (mean(a(1:k-1))*(k-1) + a(k)) / k (by the definition of mean). Another optimization is to use logical indexing instead of find. It takes up a bit more space, but is much faster.
% split into different years
datev = datevec(time);
iyear = datev(:,1);
unique_year = unique(iyear);
for k = 1:length(unique_year);
idx = (iyear == unique_year(k));
% select data for year
d_time = time(idx);
d_datx = datx(idx);
d_daty = daty(idx);
mae_out = some_function(d_datx, d_daty);
if k == 1
mean_mae(k) = mean_out;
else
mean_mae(k) = (mean_mae(k-1) * (k-1) + mean(mean_out)) / k;
end
end
function mae = some_function(datx, daty)
mae = mean(abs(datx - daty));
end
As you can see, this should give you approximately 20x or more speedup.
I wrote my own function for Octave, but unfortunately aside of the final result value, the variable "result" is written to console on every change, which is an unwanted behavior.
>> a1 = [160 60]
a1 =
160 60
>> entr = my_entropy({a1}, false)
result = 0.84535
entr = 0.84535
Should be
>> a1 = [160 60]
a1 =
160 60
>> entr = my_entropy({a1}, false)
entr = 0.84535
I don't get the idea of ~ and it don't work, at least when I tried.
Code is as follows:
# The main difference between MATLAB bundled entropy function
# and this custom function is that they use a transformation to uint8
# and the bundled entropy() function is used mostly for signal processing
# while I simply use a straightforward solution usefull e.g. for learning trees
function f = my_entropy(data, weighted)
# function accepts only cell arrays;
# weighted tells whether return one weighed average entropy
# or return a vector of entropies per bucket
# moreover, I find vectors as the only representation of "buckets"
# in other words, vector = bucket (leaf of decision tree)
if nargin < 2
weighted = true;
end;
rows = #(x) size(x,1);
cols = #(x) size(x,2);
if weighted
result = 0;
else
result = [];
end;
for r = 1:rows(data)
for c = 1:cols(data) # in most cases this will be 1:1
omega = sum(data{r,c});
epsilon = 0;
for b = 1:cols(data{r,c})
epsilon = epsilon + ( (data{r,c}(b) / omega) * (log2(data{r,c}(b) / omega)) );
end;
if (-epsilon == 0) entropy = 0; else entropy = -epsilon; end;
if weighted
result = result + entropy
else
result = [result entropy]
end;
end;
end;
f = result;
end;
# test cases
cell1 = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] }
cell2 = { [16],[12];[16],[2];[2 2 2 2 2 2 2 2],[8 8];[12],[8 8];[16],[8 8] }
cell3 = { [16],[3 3];[16],[2];[2 2 2 2 2 2 2 2],[2 2];[12],[2];[16],[2] }
# end
In your code, you should end lines 39 and 41 with semicolon ;.
Lines finishing in semicolon aren't shown in stdout.
Add ; after result = result + entropy and result = [result entropy] in your code, or in general after any assignment that you don't want printed on screen.
If for some reason you can't modify the function, you can use evalc to prevent unwanted output (at least in Matlab). Note that the output in this case is obtained in char form:
T = evalc(expression) is the same as eval(expression) except that anything that would normally be written to the command window, except for error messages, is captured and returned in the character array T (lines in T are separated by \n characters).
As with any eval variant, this approach should be avoided if possible:
entr = evalc('my_entropy({a1}, false)');
I have written a code that stores data in a matrix, but I want to shorten it so it iterates over itself.
The number of matrices created is the known variable. If it was 3, the code would be:
for i = 1:31
if idx(i) == 1
C1 = [C1; Output2(i,:)];
end
if idx(i) == 2
C2 = [C2; Output2(i,:)];
end
if idx(i) == 3
C3 = [C3; Output2(i,:)];
end
end
If I understand correctly, you want to extract rows from Output2 into new variables based on idx values? If so, you can do as follows:
Output2 = rand(5, 10); % example
idx = [1,1,2,2,3];
% get rows from Output which numbers correspond to those in idx with given value
C1 = Output2(find(idx==1),:);
C2 = Output2(find(idx==2),:);
C3 = Output2(find(idx==3),:);
Similar to Marcin i have another solution. Here i predefine my_C as a cell array. Output2 and idx are random generated and instead of find i just use logical adressing. You have to convert the data to type cell {}
Output2 = round(rand(31,15)*10);
idx = uint8(round(1+rand(1,31)*2));
my_C = cell(1,3);
my_C(1,1) = {Output2(idx==1,:)};
my_C(1,2) = {Output2(idx==2,:)};
my_C(1,3) = {Output2(idx==3,:)};
If you want to get your data back just use e.g. my_C{1,1} for the first group.
If you have not 3 but n resulting matrices you can use:
Output2 = round(rand(31,15)*10);
idx = uint8(round(1+rand(1,31)*(n-1)));
my_C = cell(1,n);
for k=1:n
my_C(1,k) = {Output2(idx==k,:)};
end
Where n is a positive integer number
I would recommend a slighty different approach. Except for making the rest of the code more maintainable it may also slightly speed up the execution. This due to that matlab uses a JIT compiler and eval must be recompiled every time. Try this:
nMatrices = 3
for k = 1:nMatrices
C{k} = Output2(idx==k,:);
end
As patrik said in the comments, naming variables like this is poor practice. You would be better off using cell arrays M{1}=C1, or if all the Ci are the same size, even just a 3D array M, for example, where M(:,:,1)=C1.
If you really want to use C1, C2, ... as you variable names, I think you will have to use eval, as arielnmz mentioned. One way to do this in matlab is
for i=1:3
eval(['C' num2str(idx(i)) '=[C' num2str(idx(i)) ';Output2(' num2str(i) ',:)];'])
end
Edited to add test code:
idx=[2 1 3 2 2 3];
Output2=rand(6,4);
C1a=[];
C2a=[];
C3a=[];
for i = 1:length(idx)
if idx(i) == 1
C1a = [C1a; Output2(i,:)];
end
if idx(i) == 2
C2a = [C2a; Output2(i,:)];
end
if idx(i) == 3
C3a = [C3a; Output2(i,:)];
end
end
C1=[];
C2=[];
C3=[];
for i=1:length(idx)
eval(['C' num2str(idx(i)) '=[C' num2str(idx(i)) ';Output2(' num2str(i) ',:)];'])
end
all(C1a(:)==C1(:))
all(C2a(:)==C2(:))
all(C3a(:)==C3(:))
The following is an M-file for generating an MD5 digest on a file of ASCII values. I have a string of HEX values and am converting it to ASCII, then writing these values to a file which I pass to the md5 function. I am using two different online MD5 calculators to validate the script's solution. Here is one of them.
The method by which I am converting from HEX string to ACSII is by pairing two hex values and then converting the pairs to ASCII chars which are then written to the file. I know this is being done correctly because I can pass either the HEX string to the above mentioned online calculators or upload their equivalent ASCII representation and they produce the same (valid) result even though the MATLAB script gives the wrong digest.
For some reason, whether the digest is computed correctly or not depends on the number of ASCII values being read from the file. I've tried to understand if there is any pattern to this behavior but I cannot find any. It produces a valid digest for messages with 200,300-to-322, 500,996,998,1008,1010,1050,1070,1076 HEX characters which are first converted to the ASCII file. But not for 1000,1002,1004,1006,1078,1100 HEX characters. In short, I see no method to this madness... any help would be much appreciated.
% md5 Compute MD5 hash function for files
%
% d = md5(FileName)
%
% md5() computes the MD5 hash function of
% the file specified in the string FileName
% and returns it as a 64-character array d.
% The MD5 message-digest algorithm is specified
% in RFC 1321.
% The code below is for instructional and illustrational
% purposes only. It is very clear, but very slow.
% (C) Stefan Stoll, ETH Zurich, 2006
function Digest = md5(FileName)
% Guard against old Matlab versions
MatlabVersion = version;
if MatlabVersion(1)<'7'
error('md5() requires Matlab 7.0 or later!');
end
% Run autotest if no parameters are given
if (nargin==0)
md5autotest;
return;
end
% Read in entire file into uint32 vector
[Message,nBits] = readmessagefromfile(FileName);
%--------------------------------------------------
% Append a bit-1 to the last bit read from file
BytesInLastInt = mod(nBits,32)/8;
if BytesInLastInt
Message(end) = bitset(Message(end),BytesInLastInt*8+8);
else
Message = [Message; uint32(128)];
end
% Append zeros
nZeros = 16 - mod(numel(Message)+2,16);
Message = [Message; zeros(nZeros,1,'uint32')];
% Append bit length of original message as uint64, lower significant uint32 first
Lower32 = uint32(nBits);
Upper32 = uint32(bitshift(uint64(nBits),-32));
Message = [Message; Lower32; Upper32];
%--------------------------------------------------
% 64-element transformation array
T = uint32(fix(4294967296*abs(sin(1:64))));
% 64-element array of number of bits for circular left shift
S = repmat([7 12 17 22; 5 9 14 20; 4 11 16 23; 6 10 15 21].',4,1);
S = S(:).';
% 64-element array of indices into X
idxX = [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
1 6 11 0 5 10 15 4 9 14 3 8 13 2 7 12 ...
5 8 11 14 1 4 7 10 13 0 3 6 9 12 15 2 ...
0 7 14 5 12 3 10 1 8 15 6 13 4 11 2 9] + 1;
% Initial state of buffer (consisting of A, B, C and D)
A = uint32(hex2dec('67452301'));
B = uint32(hex2dec('efcdab89'));
C = uint32(hex2dec('98badcfe'));
D = uint32(hex2dec('10325476'));
%--------------------------------------------------
Message = reshape(Message,16,[]);
% Loop over message blocks each 16 uint32 long
for iBlock = 1:size(Message,2)
% Extract next block
X = Message(:,iBlock);
% Store current buffer state
AA = A;
BB = B;
CC = C;
DD = D;
% Transform buffer using message block X and the
% parameters from S, T and idxX
k = 0;
for iRound = 1:4
for q = 1:4
A = Fun(iRound,A,B,C,D,X(idxX(k+1)),S(k+1),T(k+1));
D = Fun(iRound,D,A,B,C,X(idxX(k+2)),S(k+2),T(k+2));
C = Fun(iRound,C,D,A,B,X(idxX(k+3)),S(k+3),T(k+3));
B = Fun(iRound,B,C,D,A,X(idxX(k+4)),S(k+4),T(k+4));
k = k + 4;
end
end
% Add old buffer state
A = bitadd32(A,AA);
B = bitadd32(B,BB);
C = bitadd32(C,CC);
D = bitadd32(D,DD);
end
%--------------------------------------------------
% Combine uint32 from buffer to form message digest
Str = lower(dec2hex([A;B;C;D]));
Str = Str(:,[7 8 5 6 3 4 1 2]).';
Digest = Str(:).';
%==================================================
function y = Fun(iRound,a,b,c,d,x,s,t)
switch iRound
case 1
q = bitor(bitand(b,c),bitand(bitcmp(b),d));
case 2
q = bitor(bitand(b,d),bitand(c,bitcmp(d)));
case 3
q = bitxor(bitxor(b,c),d);
case 4
q = bitxor(c,bitor(b,bitcmp(d)));
end
y = bitadd32(b,rotateleft32(bitadd32(a,q,x,t),s));
%--------------------------------------------
function y = rotateleft32(x,s)
y = bitor(bitshift(x,s),bitshift(x,s-32));
%--------------------------------------------
function sum = bitadd32(varargin)
sum = varargin{1};
for k = 2:nargin
add = varargin{k};
carry = bitand(sum,add);
sum = bitxor(sum,add);
for q = 1:32
shift = bitshift(carry,1);
carry = bitand(shift,sum);
sum = bitxor(shift,sum);
end
end
function [Message,nBits] = readmessagefromfile(FileName)
[hFile,ErrMsg] = fopen(FileName,'r');
error(ErrMsg);
%Message = fread(hFile,inf,'bit32=>uint32');
Message = fread(hFile,inf,'ubit32=>uint32');
%Message = fread(hFile);
fclose(hFile);
d = dir(FileName);
nBits = d.bytes*8;
%============================================
function md5autotest
disp('Running md5 autotest...');
Messages{1} = '';
Messages{2} = 'a';
Messages{3} = 'abc';
Messages{4} = 'message digest';
Messages{5} = 'abcdefghijklmnopqrstuvwxyz';
Messages{6} = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
Messages{7} = char(128:255);
CorrectDigests{1} = 'd41d8cd98f00b204e9800998ecf8427e';
CorrectDigests{2} = '0cc175b9c0f1b6a831c399e269772661';
CorrectDigests{3} = '900150983cd24fb0d6963f7d28e17f72';
CorrectDigests{4} = 'f96b697d7cb7938d525a2f31aaf161d0';
CorrectDigests{5} = 'c3fcd3d76192e4007dfb496cca67e13b';
CorrectDigests{6} = 'd174ab98d277d9f5a5611c2c9f419d9f';
CorrectDigests{7} = '16f404156c0500ac48efa2d3abc5fbcf';
TmpFile = tempname;
for k=1:numel(Messages)
[h,ErrMsg] = fopen(TmpFile,'w');
error(ErrMsg);
fwrite(h,Messages{k},'char');
fclose(h);
Digest = md5(TmpFile);
fprintf('%d: %s\n',k,Digest);
if ~strcmp(Digest,CorrectDigests{k})
error('md5 autotest failed on the following string: %s',Messages{k});
end
end
delete(TmpFile);
disp('md5 autotest passed!');
This is quite old, but I looked through the code and I think that you are not handling the message length properly. I also found some issues with way that certain maths were done in MatLAB differently - for instance bitshifting was not used as I had expected, and also integer maths of uint32 variables not always handled the way you'd expect. You need to handle modular math and truncating math manually/explicitly or read the help documents a little more closely regarding those topics. Also, check the character base in your options, char128:255 might not be translating correctly, try using the command to create the message and then paste it into a different interface. Even in matlab if you process that variable even once it will get messed up bad, it comes out as ???????????yadda yadda. If you were using decimal maths you might have introduced an epsilon error as well, but check out the actual length...
So I have one problem with my code, which is that one variable is not working as it's supposed to do. Here are the codes I'm using:
format long
f = inline('-x.^2');
for i = 0:10
[I(i+1) h(i+1) tid(i+1)] = trapets(f,0,1,2^i);
end
for i = 0:10
trunk(i+1) = I(i+1) - log(2);
end
hold on
grid on
plot(log(h),log(trunk),'r+')
t = -7:0;
c = polyfit(log(h),log(trunk),1);
yy = polyval(c,t);
plot(t,yy)
grid off
hold off
koefficienter = real(c)
and also this:
function [ I,h,tid ] = trapets(f,a,b,n )
h=(b-a)/n;
tic;
I=(f(a)+f(b));
for k=2:2:n-2
I = I+2*f(a+k*h);
end
for k = 1:2:n-1
I = I + 4*f(a+k*h);
end
I = I * h/3;
tid = toc;
end
So the problem here is that the variable I is not changing value. It gets 11 values when I run the first code (I don't run the last code I wrote, it's only used by the first one), but the values are all the same. I don't know if the problem is that the variable n, which I use in the the second code, never change value, although I'm trying to do that with the "2^i" part in "trapets(f,0,1,2^i)". If the case is that n never change value, is there a solution to do that?
And if the problem is not the variable n, why doesn't the variable I change value in the code?
After running your program I found out the I always equals -1/h after the for loops, which makes I = I * h/3; always give you the same result.