Write a struct into a DICOM header - matlab

I created a private DICOM tag and I would like to know if it is possible to use this tag to store a struct in a DICOM file using dicomwrite (or alike), instead of creating a field inside the DICOM header for each struct field.
(Something like saving a Patient's name, but instead of using a char data, I would use double)
Here is an example:
headerdicom = dicominfo('Test.dcm');
a.a = 1; a.b = 2; a.c = 3;
headerdicom.Private_0011_10xx_Creator = a;
img = dicomread('Test.dcm');
dicomwrite(img, 'test_modif.dcm', 'ObjectType', 'MR Image Storage', 'WritePrivate', true, headerdicom)
Undefined function 'fieldnames' for input arguments of type 'double'.
Thank you all in advance,

Depending on what "struct" means, here are your options. As you want to use a private tag which means no application but yours will be able to interpret it, you can choose the solution which is technically most appropriate. Basically your question is "which Value Representation should I assign to my private attribute using the DICOM toolkit of my choice?":
Sequence:
There is a DICOM Value Representation "Sequence" (VR=SQ) which allows you to store a list of attributes of different types. This VR is closest to a struct. A sequence can contain an arbitrary number of items each of which has the same attributes in the same order. Each attribute can have its own VR, so if your struct contains different data types (like string, integer, float), this would be my recommendation
Multi-value attribute:
DICOM supports the concept of "Value Multiplicity". This means that a single attribute can contain multiple values which are separated by backslashes. As the VR is a property of the attribute, all values must have the same type. If I understand you correctly, you have a list of floating point numbers which could be encoded as an array of doubles in one field with VR=FD (=Floating Point Double): 0.001\0.003\1.234...
Most toolkits support an indexed access to the attributes.
"Blob":
You can use an attribute with VR=OB (Other Byte) which is also used for encoding pixel data. It can contain up to 4 GB of binary data. The length of the attribute tells you of how many bytes the attribute's value consists. If you just want to copy the memory from / to the struct, this would be the way to go, but obviously it is the weakest approach in terms of type-safety and correctness of encoding. You are going to lose built in methods of your DICOM toolkit that ensure these properties.
To add a private attribute, you have to
reserve a range for the attribute specifying an odd group number and a prefix (2 hex digits) for the element numbers. (e.g. group = 0x0011, Element = 0x10xx) reserves a range from (0x0011, 0x10xx) - (0x0011, 0x10ff). This is done by specifying a Private Creator DICOM tag which holds a manufacturer name. So I suspect that instead of
headerdicom.Private_0011_10xx_Creator = a;
it should read e.g.
headerdicom.Private_0011_10xx_Creator = "Gabs";
register your private tags in the private dictionary, most of the time by specifying the Private Creator, group, element and VR (one of the options above)
Not sure how this can be done in matlab.

Related

Creating a structure within a structure with a dynamic name

I have large data sets which i want to work with in matlab.
I have a struct called Trail containing serveral structures called trail1, trail2 ...
which then contain several matrices. I now want to add another point to for instance trail1
I can do that with Trail.trail1.a2rotated(i,:) = rotpoint'; the problem is that i have to do it in a loop where the trail number as well as the a2rotated changes to e.g. a3rot...
I tired to do it like that
name ="trail"+num2str(z)+".a2rotated"+"("+i+",:)";
name = convertStringsToChars(name);
Trail.(name) = rotpoint'
But that gives me the error: Invalid field name: 'trail1.a2rotated(1,:)'.
Does someone have a solution?
The name in between brackets after the dot must be the name of a field of the struct. Other indexing operations must be done separately:
Trail.("trail"+z).a2rotated(i,:)
But you might be better off making trail(z) an array instead of separate fields with a number in the name.

What is contained in the "function workspace" field in .mat file?

I'm working with .mat files which are saved at the end of a program. The command is save foo.mat so everything is saved. I'm hoping to determine if the program changes by inspecting the .mat files. I see that from run to run, most of the .mat file is the same, but the field labeled __function_workspace__ changes somewhat.
(I am inspecting the .mat files via scipy.io.loadmat -- just loading the files and printing them out as plain text and then comparing the text. I found that save -ascii in Matlab doesn't put string labels on things, so going through Python is roundabout, but I get labels and that's useful.)
I am trying to determine from where these changes originate. Can anyone explain what __function_workspace__ contains? Why would it not be the same from one run of a given program to the next?
The variables I am really interested in are the same, but I worry that I might be overlooking some changes that might come back to bite me. Thanks in advance for any light you can shed on this problem.
EDIT: As I mentioned in a comment, the value of __function_workspace__ is an array of integers. I looked at the elements of the array and it appears that these numbers are ASCII or non-ASCII character codes. I see runs of characters which look like names of variables or functions, so that makes sense. But there are also some characters (non-ASCII) which don't seem to be part of a name, and there are a lot of null (zero) characters too. So aside from seeing names of things in __function_workspace__, I'm not sure what that stuff is exactly.
SECOND EDIT: I found that after commenting out calls to plotting functions, the content of __function_workspace__ is the same from one run of the program to the next, so that's great. At this point the only difference from one run to the next is that there is a __header__ field which contains a timestamp for the time at which the .mat file was created, which changes from run to run.
THIRD EDIT: I found an article, http://nbviewer.jupyter.org/gist/mbauman/9121961 "Parsing MAT files with class objects in them", about reverse-engineering the __function_workspace__ field. Thanks to Matt Bauman for this very enlightening article and thanks to #mpaskov for the pointer. It appears that __function_workspace__ is an undocumented catch-all for various stuff, only one part of which is actually a "function workspace".
1) Diffing .mat files
You may want to take a look at DiffPlug. It can do diffs of MAT files and I believe there is a command line interface for it as well.
2) Contents of function_workspace
SciPy's __function_workspace__ refers to a special variable at the end of a MAT file that contains extra data needed for reference types (e.g. table, string, handle, etc.) and various other stuff that is not covered by the official documentation. The name is misleading as it really refers to the "Subsystem" (briefly mentioned in the official spec as an offset in the header).
For example, if you save a reference type, e.g., emptyString = "", the resulting .mat will contain the following two entries:
(1) The variable itself. It looks sort of like a UInt32 matrix, but is actually an Opaque MCOS Reference (MATLAB Class Object System) to a string object at some location in the subsystem.
[0] Compressed (81 bytes, position = 128)
[0] Matrix (144 bytes, position = 0)
[0] UInt32[2] = [17, 0] // Opaque
[1] Int8[11] = ['emptyString'] // Variable Name
[2] Int8[4] = ['MCOS'] // Object Type
[3] Int8[6] = ['string'] // Class Name
[4] Matrix (72 bytes, position = 72)
[0] UInt32[2] = [13, 0] // UInt32
[1] Int32[2] = [6, 1] // Dimensions
[2] Int8[0] = [''] // Variable Name (not needed)
[3] UInt32[6] = [-587202560, 2, 1, 1, 1, 1] // Data (Reference Target)
(2) A UInt8 matrix without name (SciPy renamed this to __function_workspace__) at the end of the file. Aside from the missing name it looks like a standard matrix, but the data is actually another MAT file (with a reduced header) that contains the real data.
[1] Compressed (251 bytes, position = 217)
[0] Matrix (968 bytes, position = 0)
[0] UInt32[2] = [9, 0] // UInt8
[1] Int32[2] = [1, 920] // Dimensions
[2] Int8[0] = [''] // Variable Name
[3] ... 920 bytes ... // Data (Nested MAT File)
The format of the data is unfortunately completely undocumented and somewhat of a mess. I could post the contents of the Subsystem, but it gets somewhat overwhelming even for such a simple case. It's essentially a MAT file that contains a struct that contains a special variable (MCOS FileWrapper__) that contains a cell array with various values, including one that magically encodes various Object Properties.
Matt Bauman has done some great reverse engineering efforts (Parsing MAT files with class objects in them) that I believe all supporting implementations are based on. The MFL Java library contains a full (read-only) implementation of this (see McosFileWrapper.java).
Some updates on Matt Bauman's post that we found are:
The MCOS reference can refer to an array of handle objects and may have more than 6 values. It contains sizing information followed by an array of indices (see McosReference.java).
The Object Id field looks like a unique id, but the order seems random and sometimes doesn't match. I don't know what this value is, but completely ignoring it seems to work well :)
I've seen Segment 5 populated in .fig files, but I haven't been able to narrow down what's in there yet.
Edit: Fyi, once the string object is correctly parsed and all properties are filled in, the actual string value is encoded in yet another undocumented format (see testDoubleQuoteString)

MIDL marshaling array of structures using size_is in unmanaged c++

I’m trying to retrieve an array of structures through a COM interface. It works when the number of structures is 1. When the number of structures is greater than 1, only the first structure is marshaled correctly. The remaining structures in the array have garbage data.
My interface looks like this:
typedef struct tagINTOBJINTERFACE
{
long lObjectId;
IMyObject* pObj;
} INTOBJINTERFACE;
[
object,
uuid(<removed>),
dual,
nonextensible,
helpstring("Interface"),
pointer_default(unique)
]
interface IMyInterface : IUnknown {
HRESULT CreateObjects(
[in] VARIANT* pvDataStream,
[out]long* Count,
[out,size_is(,*Count)] INTOBJINTERFACE** ppStruct
);
};
I allocate the structure memory like this:
long lCountInterfaces = listInterfaces.GetCount();
long lMemSize = lCountInterfaces * sizeof(INTOBJINTERFACE);
INTOBJINTERFACE* pstruct = (INTOBJINTERFACE*) CoTaskMemAlloc( lMemSize );
And then fill in the members of each structure in the array. I can see in the debugger that all members of all array elements are properly assigned.
After filling in the structures, I assign “*ppStruct = pstruct” to pass the array out.
I can also see that the out parameter “*Count” is properly set to the correct number of elements.
Why doesn’t this work?
Reason:
Your application uses the universal marshaller from windows for mashalling.
The universal marshaller reads the meta data from your typelib (*.tlb).
The generated typelib doesn't support size_is.
Todo:
You should use the Proxy/Stub dll generated by Visual Studio (...PS project).
- Build the Proxy/Stub dll
- call "regsvr32 "
- remove the "TypeLib = s '{?????-...-????}'" entry from your servers "*.rgs"
file
In addition to Joerg's answer that using size_is is not possible, here is what's possible: SAFEARRAY.
Keywords: Safearray of UDT
Explanation and examples are here
Short summary:
Define structure with a GUID.
Create object of type IRecordInfo that describes your structure using the type library.
Use SafeArrayCreateEx to create SAFEARRAY of type VT_RECORD.
Fill it with data.
Retrieve on the other side.

Split TextChunk into words

I've found this example which splits a pdf document into TextChunks
Is there either
a) a method to split each TextChunk further into words/characters from each TextChunk and still be able to find it's location?
or
b) a method to get parse a PDF into words/characters instead of chunks and find the location?
Is there a method to split each TextChunk further into words/characters from each TextChunk and still be able to find it's location?
You cannot split these TextChunk objects further because this TextChunk class is merely a helper class transporting a very small amount of information, cf. its constructor arguments String str, Vector startLocation, Vector endLocation, float charSpaceWidth, especially there is no information on the individual character widths or the associated text size and font to derive the individual character widths from.
But you can of course change the method RenderText (in which the incoming more complete TextRenderInfo instances are reduced to TextChunk instances):
public virtual void RenderText(TextRenderInfo renderInfo) {
LineSegment segment = renderInfo.GetBaseline();
TextChunk location = new TextChunk(renderInfo.GetText(), segment.GetStartPoint(), segment.GetEndPoint(), renderInfo.GetSingleSpaceWidth());
locationalResult.Add(location);
}
In particular you can first split the TextRenderInfo instance using its GetCharacterRenderInfos() method into single character TextRenderInfo instances, loop through these and create individual TextChunk instances for each of them.
You probably don't see that method in the repository where you are looking as iTextSharp has already switched to the new SourceForge versioning infrastructure. Thus, you should switch to the current iTextSharp repository.
Is there a method to get parse a PDF into words/characters instead of chunks and find the location?
Of course you can implement IRenderListener to create an extraction strategy which does exactly what you need. You can find some discussions of that topic on stackoverflow for iText and iTextSharp, e.g. ITextSharp Find coordinates of specific text in PDF, Get the exact Stringposition in PDF, Retrieve the respective coordinates of all words on the page with itextsharp and others.

Writing Private Dicom data in matlab without modifying the dictionary

I am reading a dicom file in matlab and modifying some data of it and trying to save it into another file, but while doing so, the private dicom data are either not written at all (when 'WritePrivate' is set to 0) or it's written as a UINT8 array which become incomprehensible and useless. I even tried to copy the data that I get in from the original dicom file to a new structure and write to a new dicom file but even though the private data remains fine in new structure it doesn't remain so in the new dicom file. Is there any way to keep this private data intact while copying in to a new dicom file without changing the matlab dicom dictionary?
I have provided the following code to show what I'm trying to do.
X=dicomread('Bad011_4CH_01.dcm');
metadata = dicominfo('Bad011_4CH_01.dcm');
metadata.PatientName.FamilyName='LastName';
metadata.PatientName.GivenName='FirstName';
birthday=metadata.PatientBirthDate;
year=birthday(1,1:4);
newyear=strcat(year,'0101');
metadata.PatientBirthDate=newyear;
names=fieldnames(metadata);
h=metadata;
dicomwrite(X,'example.dcm',h,'CreateMode','copy');
newh=dicominfo('example.dcm');
Here the data in newh contains none of the private data. If I change the code to the following
dicomwrite(X,'example.dcm',h,'CreateMode','copy','WritePrivate',1);
In this case the private data gets totally changed to some UIN8 array and useless. The ideal solution for my task would be to enable keeping the private data in the newly created dicom file without changing the matlab dicom dictionary.
Have you tried something like:
dicomwrite(uint16(image), fileName, 'ObjectType', 'MR Image Storage', ...
'WritePrivate', true, header);
where "header" is a struct composed of name-value pairs using the same format as header data that you would get from MATLAB's dicominfo function? My general approach to image creation in MATLAB is to avoid using CreateMode 'copy' and instead build my own DICOM header by explicitly copying the attributes that it makes sense to copy and generating my own values for attributes that should have new values.
To write private tags, you would do something like:
header.Private_0045_10xx_Creator = 'MY_PRIVATE_BLOCK';
header.Private_0045_1001 = int32(65535);
If you then write this out using dicomwrite and read it back in using hdr = dicominfo('mynewimg');, you can see that it really did write the value as a 32-bit integer even though, unfortunately, if is always going to read the data in as a vector of uint8 values.
>> hdr.Private_0045_1001
ans =
255
255
0
0
As long as you know what type to expect, you should be able to typecast the data back to the desired type after you've read the header. For example:
>> typecast(hdr.Private_0045_1001, 'int32')
ans =
65535
I know I'm about 8 years late, but have you tried
dicomwrite(..., 'VR', 'explicit')
?
It solves the "reading as uint8" problem for me.
Edit:
Actually, it looks like you need to specify a dicom dictionary with the VR of that tag. If you combine this with 'VR', 'explicit', then the program reading the dicom won't need to dictionary file.