Matlab Create MD5 Checksum for Variables - matlab

For debugging, i wish to compare several objects, and create some unique ID for each them, and according its contents and structure, the ID should be equal. Is there any existing function for doing this?
For example if an object is an structure:
S:
S.a1 = 1
S.a2 = 2
S.b1 = 3
S.b11 = 4
S.b12 = 5
S.c1 = 6
My current choice is copying it to the disk and calculate a MD5 64-bit checksum, which do not work because this hash depends on the modified date of the file.

One solution is mentioned here. DataHash function is that solution:
function H = DataHash(Data)
Engine = java.security.MessageDigest.getInstance('MD5');
H = CoreHash(Data, Engine);
H = sprintf('%.2x', H); % To hex string
function H = CoreHash(Data, Engine)
% Consider the type of empty arrays:
S = [class(Data), sprintf('%d ', size(Data))];
Engine.update(typecast(uint16(S(:)), 'uint8'));
H = double(typecast(Engine.digest, 'uint8'));
if isa(Data, 'struct')
n = numel(Data);
if n == 1 % Scalar struct:
F = sort(fieldnames(Data)); % ignore order of fields
for iField = 1:length(F)
H = bitxor(H, CoreHash(Data.(F{iField}), Engine));
end
else % Struct array:
for iS = 1:n
H = bitxor(H, CoreHash(Data(iS), Engine));
end
end
elseif isempty(Data)
% No further actions needed
elseif isnumeric(Data)
Engine.update(typecast(Data(:), 'uint8'));
H = bitxor(H, double(typecast(Engine.digest, 'uint8')));
elseif ischar(Data) % Silly TYPECAST cannot handle CHAR
Engine.update(typecast(uint16(Data(:)), 'uint8'));
H = bitxor(H, double(typecast(Engine.digest, 'uint8')));
elseif iscell(Data)
for iS = 1:numel(Data)
H = bitxor(H, CoreHash(Data{iS}, Engine));
end
elseif islogical(Data)
Engine.update(typecast(uint8(Data(:)), 'uint8'));
H = bitxor(H, double(typecast(Engine.digest, 'uint8')));
elseif isa(Data, 'function_handle')
H = bitxor(H, CoreHash(functions(Data), Engine));
else
warning(['Type of variable not considered: ', class(Data)]);
end
Also, you can find the complete version of the code here.

A more general solution than #OmG 's answer, that relies on a little bit of undocumented functionality:
function str = hash(in)
% Get a bytestream from the input. Note that this calls saveobj.
inbs = getByteStreamFromArray(in);
% Create hash using Java Security Message Digest.
md = java.security.MessageDigest.getInstance('SHA1');
md.update(inbs);
% Convert to uint8.
d = typecast(md.digest, 'uint8');
% Convert to a hex string.
str = dec2hex(d)';
str = lower(str(:)');
The undocumented function getByteStreamFromArray returns the byte stream that would be written to disk if you were to call the save -v7 command on the variable. It works for any variable that is less than 2GB in size, including not only the built-in types (numeric, logical, struct, cell etc.) covered by #OmG 's CoreHash, but also built-in and user-defined classes as well.
Note that getByteStreamFromArray calls saveobj, so it will ignore Transient properties - this is almost certainly a good thing for hashing as well as saving.
PS In either solution, SHA1 is probably better than MD5.

Related

Matlab generate variable names when subdividing large data [duplicate]

This question already has answers here:
matlab iterative filenames for saving
(4 answers)
Closed 2 years ago.
I have a large data set (vector) I want to split up in to n smaller sets to look at later with other scripts. I.e.if n = 10 I want to turn one 1x80000000 double in to ten 1x8000000 doubles. My thoughts are turn the original in to a n by m matrix then save each row of the matrix in to it's own vector, as follows.
%data-n-splitter
n = 10 %number of sections
L = length(data);
Ls = L/n;
Ls = floor(Ls);
Counter = 1;
%converting vector to matrix
datamatrix = zeros(n,Ls);
for k = 1:n
datamatrix(k,:) = data(Counter:Counter+ Ls - 1);
Counter = Counter + Ls;
end
How do I make matlab loop this part of the code n times:
%save each row of matrix as seperate vector
P1 = datamatrix(1,:);
P2 = datamatrix(2,:);
P3 = datamatrix(3,:);
P4 = datamatrix(4,:);
P5 = datamatrix(5,:);
P6 = datamatrix(6,:);
P7 = datamatrix(7,:);
P8 = datamatrix(8,:);
P9 = datamatrix(9,:);
P10 = datamatrix(10,:);
Example answer that I'm hoping for:
for k = 1:n
P('n') = datamatrix(n,:);
end
I've seen some articles about using cell arrays but the scripts I'm passing the variables to aren't set up for this so I'd rather not go down that route if possible.
There are several options:
use a struct, which comes closest to what you are hoping for,
use a cell, more convenient looping but no access over meaningful names,
use a higher-dimension matrix (in your case it is only 2D, but the same applies for 3D or higher). This is the most memory-efficient option.
To round this off, you could also use a table, which is a hybrid of a struct and a cell as you can use both notations to access it. There is no other benefit.
Now, how to do this? The simplest (and best) solution first: create a 2D matrix with reshape
Ary = 1:10; % I shrank your 1x80000000 array to 1x10 but you'll get the idea
%% create new structure
Mat = reshape(Ary,5,2);
%% access new structure (looping over columns)
for i = 1:size(Ary,2)
% access columns through slicing
ary_sct = Mat(:,i);
% do something
end
Pro: memory efficient (requires the same amount of memory as the initial array); easy looping
Con: only works if you can slice the initial array evenly
Next: create a cell
Ary = 1:10;
n = 2; % number of sections
L = floor(length(Ary)/n);
% allocate memory
C = cell(1,n);
%% create new structure
for i = 1:n
% access the content of a cell with {}
C{i} = Ary((i-1)*L+1:i*L);
end
%% access new structure (looping over entries)
for i = 1:length(C)
% access the content of a cell with {}
ary_sct = C{i};
% do something
end
Pro: You can store anything in a cell. Every data type and -- what is often more important -- of any dimension
Con: The accessing the content (through {}) or accessing the element (through ()) is a bit annoying if your are a beginner; each element require a memory overhead of about 60 bytes as those are pointers, which need to store the information where and on what they are pointing.
Next: use a struct
Ary = 1:10;
n = 2; % number of sections
L = floor(length(Ary)/n);
% create empty struct
S = struct();
%% create new structure
for i = 1:n
% create fieldname (must start with a character!)
fld = num2str(i,'F%d');
% write to field (note the brackets)
S.(fld) = Ary((i-1)*L+1:i*L);
end
%% access new structure (looping over fieldnames)
% get all field names
FlNms = fieldnames(S);
for i = 1:length(FldNames)
% access field names (this is a cell!)
fld = FldNms{i};
% access struct
ary_sct = S.(fld);
% do something
end
Pro: Field names are convenient to keep the overview of your data
Con: accessing field names in a loop is a bit tedious; each element require a memory overhead of about 60 bytes as those are pointers, which need to store the information where and on what they are pointing.

Is there an Octave equivalent of Matlab's `contains` function?

Is there an equivalent of MATLAB's contains function in Octave? Or, is there a simpler solution than writing my own function in Octave to replicate this functionality? I am in the process of switching to Octave from MATLAB and I use contains throughout my MATLAB scripts.
Let's stick to the example from the documentation on contains: In Octave, there are no (double-quoted) strings as introduced in MATLAB R2017a. So, we need to switch to plain, old (single-quoted) char arrays. In the see also section, we get a link to strfind. We'll use this function, which is also implemented in Octave to create an anonymous function mimicking the behaviour of contains. Also, we will need cellfun, which is available in Octave, too. Please see the following code snippet:
% Example adapted from https://www.mathworks.com/help/matlab/ref/contains.html
% Names; char arrays ("strings") in cell array
str = {'Mary Ann Jones', 'Paul Jay Burns', 'John Paul Smith'}
% Search pattern; char array ("string")
pattern = 'Paul';
% Anonymous function mimicking contains
contains = #(str, pattern) ~cellfun('isempty', strfind(str, pattern));
% contains = #(str, pattern) ~cellfun(#isempty, strfind(str, pattern));
TF = contains(str, pattern)
The output is as follows:
str =
{
[1,1] = Mary Ann Jones
[1,2] = Paul Jay Burns
[1,3] = John Paul Smith
}
TF =
0 1 1
That should resemble the output of MATLAB's contains.
So, in the end - yes, you need to replicate the functionality by yourself, since strfind is no exact replacement.
Hope that helps!
EDIT: Use 'isempty' instead of #isempty in the cellfun call to get a faster in-built implementation (see carandraug's comment below).
I'm not too familiar with MuPad functions, but it looks like this is reinventing the ismember function (which exists in both Matlab and Octave).
E.g.
ismember( {'jim', 'stan'}, {'greta', 'george', 'jim', 'jenny'} )
% ans = 1 0
i.e. 'jim' is a member of {'greta', 'george', 'jim', 'jenny'}, whereas 'stan' is not.
Furthermore, ismember also supports finding the index of the matched element:
[BoolVal, Idx] = ismember( {'jim', 'stan'}, {'greta', 'george', 'jim', 'jenny'} )
% BoolVal = 1 0
% Idx = 3 0
Personally I use my own implementation, which returns 1 if a string str contains entire substring sub:
function res = containsStr(str, sub)
res = 0;
strCharsCount = length(str);
subCharsCount = length(sub);
startCharSub = sub(1);
% loop over character of main straing
for ic = 1:strCharsCount
currentChar = str(ic);
% if a substring starts from current character
if (currentChar == startCharSub)
%fprintf('Match! %s = %s\n', currentChar, startCharSub);
matchedCharsCount = 1;
% loop over characters of substring
for ics = 2:subCharsCount
nextCharIndex = ic + (ics - 1);
% if there's enough chars in the main string
if (nextCharIndex <= strCharsCount)
nextChar = str(nextCharIndex);
nextCharSub = sub(ics);
if (nextChar == nextCharSub)
matchedCharsCount = matchedCharsCount + 1;
end
end
end
%fprintf('Matched chars = %d / %d\n', matchedCharsCount, subCharsCount);
% the substring is inside the main one
if (matchedCharsCount == subCharsCount)
res = 1;
end
end
end
end

MATLAB: Using a for loop within another function

I am trying to concatenate several structs. What I take from each struct depends on a function that requires a for loop. Here is my simplified array:
t = 1;
for t = 1:5 %this isn't the for loop I am asking about
a(t).data = t^2; %it just creates a simple struct with 5 data entries
end
Here I am doing concatenation manually:
A = [a(1:2).data a(1:3).data a(1:4).data a(1:5).data] %concatenation function
As you can see, the range (1:2), (1:3), (1:4), and (1:5) can be looped, which I attempt to do like this:
t = 2;
A = [for t = 2:5
a(1:t).data
end]
This results in an error "Illegal use of reserved keyword "for"."
How can I do a for loop within the concatenate function? Can I do loops within other functions in Matlab? Is there another way to do it, other than copy/pasting the line and changing 1 number manually?
You were close to getting it right! This will do what you want.
A = []; %% note: no need to initialize t, the for-loop takes care of that
for t = 2:5
A = [A a(1:t).data]
end
This seems strange though...you are concatenating the same elements over and over...in this example, you get the result:
A =
1 4 1 4 9 1 4 9 16 1 4 9 16 25
If what you really need is just the .data elements concatenated into a single array, then that is very simple:
A = [a.data]
A couple of notes about this: why are the brackets necessary? Because the expressions
a.data, a(1:t).data
don't return all the numbers in a single array, like many functions do. They return a separate answer for each element of the structure array. You can test this like so:
>> [b,c,d,e,f] = a.data
b =
1
c =
4
d =
9
e =
16
f =
25
Five different answers there. But MATLAB gives you a cheat -- the square brackets! Put an expression like a.data inside square brackets, and all of a sudden those separate answers are compressed into a single array. It's magic!
Another note: for very large arrays, the for-loop version here will be very slow. It would be better to allocate the memory for A ahead of time. In the for-loop here, MATLAB is dynamically resizing the array each time through, and that can be very slow if your for-loop has 1 million iterations. If it's less than 1000 or so, you won't notice it at all.
Finally, the reason that HBHB could not run your struct creating code at the top is that it doesn't work unless a is already defined in your workspace. If you initialize a like this:
%% t = 1; %% by the way, you don't need this, the t value is overwritten by the loop below
a = []; %% always initialize!
for t = 1:5 %this isn't the for loop I am asking about
a(t).data = t^2; %it just creates a simple struct with 5 data entries
end
then it runs for anyone the first time.
As an appendix to gariepy's answer:
The matrix concatenation
A = [A k];
as a way of appending to it is actually pretty slow. You end up reassigning N elements every time you concatenate to an N size vector. If all you're doing is adding elements to the end of it, it is better to use the following syntax
A(end+1) = k;
In MATLAB this is optimized such that on average you only need to reassign about 80% of the elements in a matrix. This might not seam much, but for 10k elements this adds up to ~ an order of magnitude of difference in time (at least for me).
Bare in mind that this works only in MATLAB 2012b and higher as described in this thead: Octave/Matlab: Adding new elements to a vector
This is the code I used. tic/toc syntax is not the most accurate method for profiling in MATLAB, but it illustrates the point.
close all; clear all; clc;
t_cnc = []; t_app = [];
N = 1000;
for n = 1:N;
% Concatenate
tic;
A = [];
for k = 1:n;
A = [A k];
end
t_cnc(end+1) = toc;
% Append
tic;
A = [];
for k = 1:n;
A(end+1) = k;
end
t_app(end+1) = toc;
end
t_cnc = t_cnc*1000; t_app = t_app*1000; % Convert to ms
% Fit a straight line on a log scale
P1 = polyfit(log(1:N),log(t_cnc),1); P_cnc = #(x) exp(P1(2)).*x.^P1(1);
P2 = polyfit(log(1:N),log(t_app),1); P_app = #(x) exp(P2(2)).*x.^P2(1);
% Plot and save
loglog(1:N,t_cnc,'.',1:N,P_cnc(1:N),'k--',...
1:N,t_app,'.',1:N,P_app(1:N),'k--');
grid on;
xlabel('log(N)');
ylabel('log(Elapsed time / ms)');
title('Concatenate vs. Append in MATLAB 2014b');
legend('A = [A k]',['O(N^{',num2str(P1(1)),'})'],...
'A(end+1) = k',['O(N^{',num2str(P2(1)),'})'],...
'Location','northwest');
saveas(gcf,'Cnc_vs_App_test.png');

Read multiple files with for loop

My code is below. In the code, I am evaluating only the data in the 'fb2010' file. I want to add other files" 'fb2020', 'fb2030', and 'fb2040' and evaluate their data by the same code. My question is how to apply a for loop and include the other data files. I tried, but I got confused by the for loop.
load('fb2010'); % loading the data
x = fb2010(3:1:1502,:);
% y_filt = filter(b,a,x); % filtering the received signal
y_filt= filter(b,a,x,[],2);
%%%%%%% fourier transform
nfft = length(y_filt);
res = fft(y_filt,nfft,2)/nfft;
res2 = res(:,1:nfft/2+1); %%%% taking single sided spectrum
res3 = fft(res2,[],2);
for i = 3:1:1500 %%%% dividing each row by first row.
resd(i,:) = res3(i,:)./res3(1,:);
end
I'm assuming that your files are MAT-files, not ASCII. You can do this by having load return a struct and using dynamic field referencing:
n = 4;
for i = 1:n
vname = ['fb20' int2str(i) '0']; % Create file/variable name based on index
s = load(vname); % Load data as struct (overwriting previous s)
x = s.(vname)(3:1:1502,:); % Access struct with dynamic field reference
% Rest of your code
...
end
If you're using a plain ASCII file, load won't produce a struct. However, such files are much simpler (see documentation for load/save). The following code would probably work:
n = 4;
for i = 1:n
vname = ['fb20' int2str(i) '0']; % Create file/variable name based on index
s = load(vname); % Load data as matrix (overwriting previous s)
x = s(3:1:1502,:); % Directly index matrix
% Rest of your code
...
end
It would be a good idea to add the file extension to your load command to make your code more readable.

looping technique with the data using variable name?

This is my code program which is x is my data, and i have another data name such as af4,f7 and f8.. How can I do looping technique on my program, so that the x will be automatically change into af4, then f7 and last f8 in Matlab?
x=af3;
d = fdesign.lowpass('Fp,Fst,Ap,Ast',4,5,1,40,128);
Hd = design(d,'butter');
fvtool(Hd);
y_delta = filter(Hd,x);
How do you generate these variables af4, af7 and af8? If you can create them as cells in a cell array or as fields in struct - your life would be much easier.
If you have no control over the variables, you can use eval:
varNames = {'af3', 'af4', 'af7', 'af8' }; % as strings
for vi=1:numel(varNames)
x = eval( varNames{vi} ); % here''s the trick
% continue here with x...
end
Note however that it is extremely unrecomanded to use eval.
I think this is what you could use:
xCell = {af3, af4, af7, af8};
for xi = 1:nnumel(xCell)
x = xCell{xi};
% do what you want to do with x
end