What is contained in the "function workspace" field in .mat file? - matlab

I'm working with .mat files which are saved at the end of a program. The command is save foo.mat so everything is saved. I'm hoping to determine if the program changes by inspecting the .mat files. I see that from run to run, most of the .mat file is the same, but the field labeled __function_workspace__ changes somewhat.
(I am inspecting the .mat files via scipy.io.loadmat -- just loading the files and printing them out as plain text and then comparing the text. I found that save -ascii in Matlab doesn't put string labels on things, so going through Python is roundabout, but I get labels and that's useful.)
I am trying to determine from where these changes originate. Can anyone explain what __function_workspace__ contains? Why would it not be the same from one run of a given program to the next?
The variables I am really interested in are the same, but I worry that I might be overlooking some changes that might come back to bite me. Thanks in advance for any light you can shed on this problem.
EDIT: As I mentioned in a comment, the value of __function_workspace__ is an array of integers. I looked at the elements of the array and it appears that these numbers are ASCII or non-ASCII character codes. I see runs of characters which look like names of variables or functions, so that makes sense. But there are also some characters (non-ASCII) which don't seem to be part of a name, and there are a lot of null (zero) characters too. So aside from seeing names of things in __function_workspace__, I'm not sure what that stuff is exactly.
SECOND EDIT: I found that after commenting out calls to plotting functions, the content of __function_workspace__ is the same from one run of the program to the next, so that's great. At this point the only difference from one run to the next is that there is a __header__ field which contains a timestamp for the time at which the .mat file was created, which changes from run to run.
THIRD EDIT: I found an article, http://nbviewer.jupyter.org/gist/mbauman/9121961 "Parsing MAT files with class objects in them", about reverse-engineering the __function_workspace__ field. Thanks to Matt Bauman for this very enlightening article and thanks to #mpaskov for the pointer. It appears that __function_workspace__ is an undocumented catch-all for various stuff, only one part of which is actually a "function workspace".

1) Diffing .mat files
You may want to take a look at DiffPlug. It can do diffs of MAT files and I believe there is a command line interface for it as well.
2) Contents of function_workspace
SciPy's __function_workspace__ refers to a special variable at the end of a MAT file that contains extra data needed for reference types (e.g. table, string, handle, etc.) and various other stuff that is not covered by the official documentation. The name is misleading as it really refers to the "Subsystem" (briefly mentioned in the official spec as an offset in the header).
For example, if you save a reference type, e.g., emptyString = "", the resulting .mat will contain the following two entries:
(1) The variable itself. It looks sort of like a UInt32 matrix, but is actually an Opaque MCOS Reference (MATLAB Class Object System) to a string object at some location in the subsystem.
[0] Compressed (81 bytes, position = 128)
[0] Matrix (144 bytes, position = 0)
[0] UInt32[2] = [17, 0] // Opaque
[1] Int8[11] = ['emptyString'] // Variable Name
[2] Int8[4] = ['MCOS'] // Object Type
[3] Int8[6] = ['string'] // Class Name
[4] Matrix (72 bytes, position = 72)
[0] UInt32[2] = [13, 0] // UInt32
[1] Int32[2] = [6, 1] // Dimensions
[2] Int8[0] = [''] // Variable Name (not needed)
[3] UInt32[6] = [-587202560, 2, 1, 1, 1, 1] // Data (Reference Target)
(2) A UInt8 matrix without name (SciPy renamed this to __function_workspace__) at the end of the file. Aside from the missing name it looks like a standard matrix, but the data is actually another MAT file (with a reduced header) that contains the real data.
[1] Compressed (251 bytes, position = 217)
[0] Matrix (968 bytes, position = 0)
[0] UInt32[2] = [9, 0] // UInt8
[1] Int32[2] = [1, 920] // Dimensions
[2] Int8[0] = [''] // Variable Name
[3] ... 920 bytes ... // Data (Nested MAT File)
The format of the data is unfortunately completely undocumented and somewhat of a mess. I could post the contents of the Subsystem, but it gets somewhat overwhelming even for such a simple case. It's essentially a MAT file that contains a struct that contains a special variable (MCOS FileWrapper__) that contains a cell array with various values, including one that magically encodes various Object Properties.
Matt Bauman has done some great reverse engineering efforts (Parsing MAT files with class objects in them) that I believe all supporting implementations are based on. The MFL Java library contains a full (read-only) implementation of this (see McosFileWrapper.java).
Some updates on Matt Bauman's post that we found are:
The MCOS reference can refer to an array of handle objects and may have more than 6 values. It contains sizing information followed by an array of indices (see McosReference.java).
The Object Id field looks like a unique id, but the order seems random and sometimes doesn't match. I don't know what this value is, but completely ignoring it seems to work well :)
I've seen Segment 5 populated in .fig files, but I haven't been able to narrow down what's in there yet.
Edit: Fyi, once the string object is correctly parsed and all properties are filled in, the actual string value is encoded in yet another undocumented format (see testDoubleQuoteString)

Related

Find category of MATLAB mlint warning ID

I'm using the checkcode function in MATLAB to give me a struct of all error messages in a supplied filename along with their McCabe complexity and ID associated with that error. i.e;
info = checkcode(fileName, '-cyc','-id');
In MATLAB's preferences, there is a list of all possible errors, and they are broken down into categories. Such as "Aesthetics and Readability", "Syntax Errors", "Discouraged Function Usage" etc.
Is there a way to access these categories using the error ID gained from the above line of code?
I tossed around different ideas in my head for this question and was finally able to come up with a mostly elegant solution for how to handle this.
The Solution
The critical component of this solution is the undocumented -allmsg flag of checkcode (or mlint). If you supply this argument, then a full list of mlint IDs, severity codes, and descriptions are printed. More importantly, the categories are also printed in this list and all mlint IDs are listed underneath their respective mlint category.
The Execution
Now we can't simply call checkcode (or mlint) with only the -allmsg flag because that would be too easy. Instead, it requires an actual file to try to parse and check for errors. You can pass any valid m-file, but I have opted to pass the built-in sum.m because the actual file itself only contains help information (as it's real implementation is likely C++) and mlint is therefore able to parse it very rapidly with no warnings.
checkcode('sum.m', '-allmsg');
An excerpt of the output printed to the command window is:
INTER ========== Internal Message Fragments ==========
MSHHH 7 this is used for %#ok and should never be seen!
BAIL 7 done with run due to error
INTRN ========== Serious Internal Errors and Assertions ==========
NOLHS 3 Left side of an assignment is empty.
TMMSG 3 More than 50,000 Code Analyzer messages were generated, leading to some being deleted.
MXASET 4 Expression is too complex for code analysis to complete.
LIN2L 3 A source file line is too long for Code Analyzer.
QUIT 4 Earlier syntax errors confused Code Analyzer (or a possible Code Analyzer bug).
FILER ========== File Errors ==========
NOSPC 4 File <FILE> is too large or complex to analyze.
MBIG 4 File <FILE> is too big for Code Analyzer to handle.
NOFIL 4 File <FILE> cannot be opened for reading.
MDOTM 4 Filename <FILE> must be a valid MATLAB code file.
BDFIL 4 Filename <FILE> is not formed from a valid MATLAB identifier.
RDERR 4 Unable to read file <FILE>.
MCDIR 2 Class name <name> and #directory name do not agree: <FILE>.
MCFIL 2 Class name <name> and file name do not agree: <file>.
CFERR 1 Cannot open or read the Code Analyzer settings from file <FILE>. Using default settings instead.
...
MCLL 1 MCC does not allow C++ files to be read directly using LOADLIBRARY.
MCWBF 1 MCC requires that the first argument of WEBFIGURE not come from FIGURE(n).
MCWFL 1 MCC requires that the first argument of WEBFIGURE not come from FIGURE(n) (line <line #>).
NITS ========== Aesthetics and Readability ==========
DSPS 1 DISP(SPRINTF(...)) can usually be replaced by FPRINTF(...).
SEPEX 0 For better readability, use newline, semicolon, or comma before this statement.
NBRAK 0 Use of brackets [] is unnecessary. Use parentheses to group, if needed.
...
The first column is clearly the mlint ID, the second column is actually a severity number (0 = mostly harmless, 1 = warning, 2 = error, 4-7 = more serious internal issues), and the third column is the message that is displayed.
As you can see, all categories also have an identifier but no severity, and their message format is ===== Category Name =====.
So now we can just parse this information and create some data structure that allows us to easily look up the severity and category for a given mlint ID.
Again, though, it can't always be so easy. Unfortunately, checkcode (or mlint) simply prints this information out to the command window and doesn't assign it to any of our output variables. Because of this, it is necessary to use evalc (shudder) to capture the output and store it as a string. We can then easily parse this string to get the category and severity associated with each mlint ID.
An Example Parser
I have put all of the pieces I discussed previously together into a little function which will generate a struct where all of the fields are the mlint IDs. Within each field you will receive the following information:
warnings = mlintCatalog();
warnings.DWVRD
id: 'DWVRD'
severity: 2
message: 'WAVREAD has been removed. Use AUDIOREAD instead.'
category: 'Discouraged Function Usage'
category_id: 17
And here's the little function if you're interested.
function [warnings, categories] = mlintCatalog()
% Get a list of all categories, mlint IDs, and severity rankings
output = evalc('checkcode sum.m -allmsg');
% Break each line into it's components
lines = regexp(output, '\n', 'split').';
pattern = '^\s*(?<id>[^\s]*)\s*(?<severity>\d*)\s*(?<message>.*?\s*$)';
warnings = regexp(lines, pattern, 'names');
warnings = cat(1, warnings{:});
% Determine which ones are category names
isCategory = cellfun(#isempty, {warnings.severity});
categories = warnings(isCategory);
% Fix up the category names
pattern = '(^\s*=*\s*|\s*=*\s*$)';
messages = {categories.message};
categoryNames = cellfun(#(x)regexprep(x, pattern, ''), messages, 'uni', 0);
[categories.message] = categoryNames{:};
% Now pair each mlint ID with it's category
comp = bsxfun(#gt, 1:numel(warnings), find(isCategory).');
[category_id, ~] = find(diff(comp, [], 1) == -1);
category_id(end+1:numel(warnings)) = numel(categories);
% Assign a category field to each mlint ID
[warnings.category] = categoryNames{category_id};
category_id = num2cell(category_id);
[warnings.category_id] = category_id{:};
% Remove the categories from the warnings list
warnings = warnings(~isCategory);
% Convert warning severity to a number
severity = num2cell(str2double({warnings.severity}));
[warnings.severity] = severity{:};
% Save just the categories
categories = rmfield(categories, 'severity');
% Convert array of structs to a struct where the MLINT ID is the field
warnings = orderfields(cell2struct(num2cell(warnings), {warnings.id}));
end
Summary
This is a completely undocumented but fairly robust way of getting the category and severity associated with a given mlint ID. This functionality existed in 2010 and maybe even before that, so it should work with any version of MATLAB that you have to deal with. This approach is also a lot more flexible than simply noting what categories a given mlint ID is in because the category (and severity) will change from release to release as new functions are added and old functions are deprecated.
Thanks for asking this challenging question, and I hope that this answer provides a little help and insight!
Just to close this issue off. I've managed to extract the data from a few different places and piece it together. I now have an excel spreadsheet of all matlab's warnings and errors with columns for their corresponding ID codes, category, and severity (ie, if it is a warning or error). I can now read this file in, look up ID codes I get from using the 'checkcode' function and draw out any information required. This can now be used to create analysis tools to look at the quality of written scripts/classes etc.
If anyone would like a copy of this file then drop me a message and I'll be happy to provide it.
Darren.

Matlab save sequence of mat files from convertTDMS stored in cell array to sequence of mat files

I have data stored in the .tdms format, gathering the data of many sensors, measured every second, every day. A new tdms file is created every day, and stored in a folder per month. Using the convertTDMS function, I have converted these tdms files to mat files.
As there are some errors in some of the measurements(e.g. negative values which can not physically occur), I have performed some corrections by loading one mat file at a time, do the calculations and then save the data into the original .mat file.
However, when I try to do what I described above in a loop (so: load .mat in folder, do calculations on one mat file (or channel therein), save mat file, repeat until all files in the folder have been done), I end up running into trouble with the limitations of the save function: so far I save all variables (or am unable to save) in the workspace when using the code below.
for k = 1:nFiles
w{k,1} = load(wMAT{k,1});
len = length(w{k,1}.(x).(y).(z));
pos = find(w{k,1}.(x).(y).(z)(1,len).(y)<0); %Wind speed must be >0 m/s
for n = 1:length(pos)
w{k,1}.(x).(y).(z)(1,len).(y)(pos(n)) = mean([w{k,1}.(x).(y).(z)(1,len).(y)(pos(n)+1),...
w{k,1}.(x).(y).(z)(1,len).(y)(pos(n)-1)],2);
end
save( name{k,1});
%save(wMAT{k,1},w{k,1}.(x),w{k,1}.ConvertVer,w{k,1}.ChanNames);
end
A bit of background information: the file names are stored in a cell array wMAT of length nFiles in the folder. Each cell in the cell array wMAT stores the fullfile path to the mat files.
The data of the files is loaded and saved into the cell array w, also of length nFiles.
Each cell in "w" has all the data stored from the tdms to mat conversion, in the format described in the convertTDMS description.
This means: to get at the actual data, I need to go from the
cell in the cell array w{k,1} (my addition)
to the struct array "ConvertedData" (Structure of all of the data objects - part of convertTDMS)
to the struct array below called "Data" (convertTDMS)
to the struct array below called "MeasuredData" (convertTDMS) -> at this level, I can access the channels which store the data.
to finally access/manipulate the values stored, I have to select a channel, e.g. (1,len), and then go via the struct array to the actual values (="Data"). (convertTDMS)
In Matlab format, this looks like "w{1, 1}.ConvertedData.Data.MeasuredData(1, len).Data(1:end)" or "w{1, 1}.ConvertedData.Data.MeasuredData(1, len).Data".
To make typing easier, I took
x = 'ConvertedData';
y = 'Data';
z = 'MeasuredData';
allowing me to write instead:
w{k,1}.(x).(y).(z)(1,len).(y)
using the dot notation.
My goal/question: I want to load the values stored in a .mat file from the original .tdms files in a loop to a cell array (or if I can do better than a cell array: please tell me), do the necessary calculations, and then save each 'corrected' .mat file using the original name.
So far, I have gotten a multitude of errors from trying a variety of solutions, going from "getfieldnames", trying to pass the name of the (dynamically changing) variable(s), etc.
Similar questions which have helped me get in the right direction include Saving matlab files with a name having a variable input, Dynamically Assign Variables in Matlab and http://www.mathworks.com/matlabcentral/answers/4042-load-files-containing-part-of-a-string-in-the-file-name-and-the-load-that-file , yet the result is that I am still no closer than doing manual labour in this case.
Any help would be appreciated.
If I understand your ultimate goal correctly, I think you're pretty much there. I think you're trying to process your .mat files and that the loading of all of the files into a cell array is not a requirement, but just part of your solution? Assuming this is the case, you could just load the data from one file, process it, save it and then repeat. This way you only ever have one file loaded at a time and shouldn't hit any limits.
Edit
You could certainly make a function out of your code and then call that in a loop, passing in the file name to modify. Personally I'd probably do that as I think it's neater solution. If you don't want to do that though, you could just replace w{k,1} with w then each time you load a file w would be overwritten. If you wanted to explicitly clear variables you can use the clear command with a space separated list of variables e.g. clear w len pos, but I don't think that this is necessary.

Reading structured variable from MAT file

I am performing an analysis which involves simulation of over 1000 cases. I extracting lots of data for each case as well (about 70MB). Currently I am saving the results for each case as:
Vessel.TotalForce
Vessel.WindForce
Vessel.CurrentForce
Vessel.WaveForce
Vessel.ConnectionForce
...
Line1.EffectiveTension
Line1.X
Line1.Y
Line2.EfectiveTension
Line2.X
Line2.Y
...
save('CaseNo1.mat')
Now, I need to perform my analysis for CaseNo1.mat to CaseNo1000. Initially I planned to create a Database.mat file by loading all cases in it and then accessing any variable using h5read. This way Matlab doesn't need to load all the data at a time. However, I am concerned now that my database file will be too big.
Is there any way I can read the structured variables from individual case files for example CaseNo1.mat without loading the CaseNo1.mat file in memory.
Matlab examples shows loading just the variables directly from MAT file without loading the whole MAT file. But I am not sure how to read structures data the same way.
x=load('CaseNo1.mat','Line1.X')
says Line1.X not found. But it's there. The command is not correct to access the data. Also tried using h5read, but it says CaseNo1.mat is not an HDF5 file.
Can anyone help with this.
Apart from this, I would also appreciate if there is any suggestion about performing such data intensive analysis.
I was wrong! I'm leaving my old answer for context, though I've edited it to reference this one. I thought I had used matfile() in that way before, but I hadn't. I just did a thorough search and ran a few test cases. You've actually run into a limitation of the way Matlab handles and references structures stored in .mat files. There is, however, a solution. It does involve some refactoring of your original code, but it shouldn't be too egregious.
Vessel_TotalForce
Vessel_WindForce
Vessel_CurrentForce
Vessel_WaveForce
Vessel_ConnectionForce
...
Line1_EffectiveTension
Line1_X
Line1_Y
Line2_EfectiveTension
Line2_X
Line2_Y
...
save('CaseNo1.mat')
Then to access, just use matfile (or load) as you were before. Like so:
Vessel_WaveForce = load('CaseNo1.mat'', 'Vessel_WaveForce')
It's important to note that this restriction doesn't appear to be caused by anything you've chosen to do in your program, but rather is imposed by the way Matlab interacts with it's native storage files when they contain structures.
EDIT: This answer works, but doesn't actually solve the problem posed in OP's question. I thought I had used matfile to generate a handle that I could access, but I was wrong. See my other answer for details.
You could use matfile, like so:
myMatFileHandle = matfile('caseNo1.mat');
thisVessel = myMatFileHandle.vessel;
Also, from the little bit I can see, you seem to be on the right track for high-volume analysis. Just remember to use sparse when applicable, and generally avoid conditionals inside of loops if possible.
Good luck!
The objective of storing data in structured format is:
To be organized
Easy scripting post processor where looping through data under one data set it required.
To store structured dataset containing integer, floating and string variables in MAT file and to be able to read just the required variable using h5read command was sought. Matlab load command is not able to read variable beyond first level from stored data in a MAT file. The h5write couldn't write string variables. Hence needed a work around to solve this problem.
To do this I have used following method:
filename = 'myMatFile';
Vessel.TotalForce = %store some data
Vessel.WindForce = %store some data
Vessel.CurrentForce = %store some data
Vessel.WaveForce = %store some data
Vessel.ConnectionForce = %store some data
...
Lin1.LineType = 'Wire'
Line1.ArcLength_0.EffectiveTension = %store some data
Line1.ArcLength_50.EffectiveTension= %store some data
Line1.ArcLength_100.EffectiveTension= %store some data
Lin2.LineType = 'Chain'
Line2.ArcLength_0.EffectiveTension= %store some data
Line2.ArcLength_50.EffectiveTension= %store some data
Line2.ArcLength_100.EffectiveTension= %store some data
save([filename '_temp.mat']);
PointToMat=matfile([filename '.mat'],'Writable',true);
PointToMat.(char(filename)) = load([filename '_temp.mat']);
delete([filename '_temp.mat']);
Now to read from the MAT file created, we can use h5read as usual. To extract the EffectiveTension for Line1, ArcLength_0:
EffectiveTension = h5read([filename '.mat'],['/' filename '/Line1/ArcLength_0/EffectiveTension']);
For string variables, h5read returns decimal values corresponding to each character. To obtain the actual string I used:
name = char(h5read([filename '.mat'],['/' filename '/Line1/LineType']));
Tried this method on my data set which is about 200MB and I could process them pretty fast. Hope this would help someone someday.
Short answer:
Having saved the data into a MAT file with the '-v7.3' option, use something like h5read(filename, '/Line2/X') to read just one structure field. You can even read an array partially, for example:
s.a = 1:100;
save('test.mat', '-v7.3', 's');
clear
h5read('test.mat', '/s/a', [1 10], [1 5], [1 3])
returns each third element of the 1:100 array, starting with the 10th element and returning 5 values:
10 13 16 19 22
Long answer:
See answer by #Amitava for the more elaborate code and topic coverage.

Matlab: Understanding a piece of code

I have a matlab code which is for printing a cell array to excel. The size of matrix is 50x13.
The row 1 is the column names.
Column 1 is dates and rest columns are numbers.
The dateformat being defined in the code is:
dFormat = struct;
dFormat.Style = struct( 'NumberFormat', '_(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(#_)' );
dFormat.Font = struct( 'Size', 8 );
Can someone please explain me what the dFormat.Style code means ?
Thanks
The first line creates an empty struct (struct with no fields) called dFormat. A structure can contain pretty much anything in one of its fields, including another structure. The second line adds a field called 'Style' to the dFormat struct and sets it equal to another struct with a field called 'NumberFormat'. The 'NumberFormat' field is set equal to that long string of characters. You now have a structure of structures. The third line is similar to the second.
Note that the first line isn't really necessary unless dFormat already exists and it needs to be "zeroed out" as dFormat.Style with create it implicitly. However, using the struct function can make code more readable in some cases as objects use a similar notation for access methods and properties. In other words, all of your code could be replaced with:
dFormat.Style.NumberFormat = '_(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(#_)';
dFormat.Font.Size = 8;
See this video from the MathWorks for more details and this list of helpful structure functions and examples.
#horchler already elaborated on structs, but I imagine you may actually be more interested in the content of this structs Style field.
In case you are solely interested in _(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(#_), that does not really look like something MATLAB related to me.
My best guess is that this code is used to later feed some other program, for examle to build an excel file.

Writing Private Dicom data in matlab without modifying the dictionary

I am reading a dicom file in matlab and modifying some data of it and trying to save it into another file, but while doing so, the private dicom data are either not written at all (when 'WritePrivate' is set to 0) or it's written as a UINT8 array which become incomprehensible and useless. I even tried to copy the data that I get in from the original dicom file to a new structure and write to a new dicom file but even though the private data remains fine in new structure it doesn't remain so in the new dicom file. Is there any way to keep this private data intact while copying in to a new dicom file without changing the matlab dicom dictionary?
I have provided the following code to show what I'm trying to do.
X=dicomread('Bad011_4CH_01.dcm');
metadata = dicominfo('Bad011_4CH_01.dcm');
metadata.PatientName.FamilyName='LastName';
metadata.PatientName.GivenName='FirstName';
birthday=metadata.PatientBirthDate;
year=birthday(1,1:4);
newyear=strcat(year,'0101');
metadata.PatientBirthDate=newyear;
names=fieldnames(metadata);
h=metadata;
dicomwrite(X,'example.dcm',h,'CreateMode','copy');
newh=dicominfo('example.dcm');
Here the data in newh contains none of the private data. If I change the code to the following
dicomwrite(X,'example.dcm',h,'CreateMode','copy','WritePrivate',1);
In this case the private data gets totally changed to some UIN8 array and useless. The ideal solution for my task would be to enable keeping the private data in the newly created dicom file without changing the matlab dicom dictionary.
Have you tried something like:
dicomwrite(uint16(image), fileName, 'ObjectType', 'MR Image Storage', ...
'WritePrivate', true, header);
where "header" is a struct composed of name-value pairs using the same format as header data that you would get from MATLAB's dicominfo function? My general approach to image creation in MATLAB is to avoid using CreateMode 'copy' and instead build my own DICOM header by explicitly copying the attributes that it makes sense to copy and generating my own values for attributes that should have new values.
To write private tags, you would do something like:
header.Private_0045_10xx_Creator = 'MY_PRIVATE_BLOCK';
header.Private_0045_1001 = int32(65535);
If you then write this out using dicomwrite and read it back in using hdr = dicominfo('mynewimg');, you can see that it really did write the value as a 32-bit integer even though, unfortunately, if is always going to read the data in as a vector of uint8 values.
>> hdr.Private_0045_1001
ans =
255
255
0
0
As long as you know what type to expect, you should be able to typecast the data back to the desired type after you've read the header. For example:
>> typecast(hdr.Private_0045_1001, 'int32')
ans =
65535
I know I'm about 8 years late, but have you tried
dicomwrite(..., 'VR', 'explicit')
?
It solves the "reading as uint8" problem for me.
Edit:
Actually, it looks like you need to specify a dicom dictionary with the VR of that tag. If you combine this with 'VR', 'explicit', then the program reading the dicom won't need to dictionary file.