XMP toolbox for Matlab - matlab

Has anyone ever heard of something that might facilitate the work with XMP metadata in Matlab?
For instance, EXIF metadata can be read simply by using the exifread command -
output = exifread(filename);
I've found this thread, but it seems to be dead.
Currently I am thinking about the following options:
Writing MEX file using C++ XMP SDK
Calling Java routines using JAVA XMP SDK
To summarize, the question is:
Do you have any idea on how XMP can be read/written in Matlab?

XMP is just XML, so you can use any MATLAB XML toolbox. My personal favourite is xml_io_tools.
If you want to use the SDK to avoid having to manually interpret what bits of the XML means, then of your two options the Java one sounds preferable. Calling Java from MATLAB is straightforward, and you avoid the hassle of building things that MEX entails.

I have found the answer. The best way is to download ExifTool and any Matlab JSON parser. It is possible to extract it from any file format, including .DNG, .XMP, .JPEG, .TIFF.
Step 1: Extract the info into temporary JSON file by using
system(['exiftool -struct -j ' fileName '>' tempFile]);
Step 2: Call the JSON parser on the tempFile
Step 3: You have the data in Matlab struct.

Related

How to convert las file to ply file?

I want to open my 3D point cloud in MATLAB. But they are in .las files. How can I display them in MATLAB???
I heard about .ply file can open 3D point data on MATLAB. So I want to know how to convert las files to ply files.
There is a .las file reader for matlab here:
https://es.mathworks.com/matlabcentral/fileexchange/48073-lasdata
Once you have the data in matlab you can use these point cloud tools, which are part of the computer vision toolbox:
https://es.mathworks.com/help/vision/3-d-point-cloud-processing.html
If you want to embrace the open source force, I'm writing a Python (easy transition from matlab) library for point cloud processing:
https://github.com/daavoo/pyntcloud
You can use the free and open-source CloudCompare software.
On the command line:
CloudCompare -O file_to_convert.las -C_EXPORT_FMT PLY -SAVE_CLOUDS
Take care to the order of the options: it seems that -SAVE_CLOUDS must be at the end.
That will result in a binary-format PLY file in the same directory as the file to convert, named using the original filename and the date of export, like: file_to_convert_2019-07-18_13h32_06_751.ply
I found no way to specify the output file name (should you find one please comment below).
Should you want a more predictable name, add option -NO_TIMESTAMP before the option -SAVE_CLOUDS (but then you risk overwriting files so be careful).
More help (such as how to export in ASCII-format) in the documentation.
I timed this on powerful PC, it took 170s to convert a 2.7GB LAS file with 102M points (XYZ,intensity,time).
if you have LAStools installed, you can use las2txt to convert your *.las/*.laz files into *.xyz format which MeshLab can import natively as a point cloud, which may then be converted into a Mesh.
There are a multitude of caveats to that depending on the source of your data-set.

Index PDF files and generate keywords summary

I have a large amount of PDF files in my local filesystem I use as documentation base and I would like to create an index of these files.
I would like to :
Parse the contents of the PDF files to get keywords.
Select the most relevant keywords to make a summary.
Create static HTML pages for some keywords with entries linked to the appropriate files.
My questions are :
Is there an existing tool to perform the whole job ?
What is the most appropriate tool to parse PDF files content, filter (by words size) and counting the words?
I consider using Perl, swish-e, pdfgrep to make a script. Do you know other tools which could be useful?
Given that points 2 and 3 seem custom I'd recommend to have your own script, use a tool out of it to parse pdf, process its output as you please, and write HTML (perhaps using another tool).
Perl is well suited for that, since it excels in processing that you'll need and also provides support for working with all kinds of file formats, via modules.
As for reading pdf, here are some options if your needs aren't too elaborate
Use CAM::PDF (and CAM::PDF::PageText) or PDF-API2 modules
Use pdftotext from the poppler library (probably in poppler-utils package)
Use pdftohtml with -xml option, read the generated simple XML file with XML::libXML or XML::Twig
The last two are external tools which you use via Perl's builtins like system.
The following text processing, to build your summary and design the output, is precisely what languages like Perl are for. The couple of tasks that are mentioned take a few lines of code.
Then write out HTML, either directly if simple or using a suitable module. Given your purpose, you may want to look into HTML::Template. Also see this post, for example.
Full parsing of PDF may be infeasible, but if the files aren't too complex it should work.
If your process for selecting keywords and building statistics is fairly common, there are integrated tools for document management (search for bibliography managers). However, I think that most of them resort to external tools to parse pdf so you may still be better off with your own script.

Information about Simulink MDL and SLX formats?

What information is available about these file formats? What tools are available for parsing these files?
Very little information is publicly available. Here's the little I've found:
MDL and SLX are MathWorks proprietary file formats for storing Simulink models. SLX was introduced in Simulink R2012a and made the default file format in R2012b. Besides the file structure, the content of SLX and MDL files are very similar. For example, key-value pairs appear to be the same between the two formats. People often say that parsing these files is a bad idea because they can change between Simulink versions (see e.g. am304 and my comments above), but I have not seen much evidence of this.
The MDL format seems to have been developed in-house at MathWorks. There seems to have been an MDL parser for python, but it was of limited functionality, and the website is down as of May 2014.
An SLX file is a zip file containing a collection of XML files, with most of the model specification stored in simulink/blockdiagram.xml. #am304 pointed out this information from the MathWorks website:
SLX is a compressed package that conforms to the Open Packaging
Conventions (OPC) interoperability standard. SLX stores model
information using UnicodeĀ® UTF-8 in XML and other international
formats. Saving Simulink models in the SLX format:
Typically reduces file size compared to MDL. The file size reduction between MDL and SLX varies depending on the model.
Solves some problems in previous releases with loading and saving MDL files containing Korean and Chinese characters.
Enables incremental loading and saving. Simulink optimizes performance and memory usage by loading only required parts of the
model and saving only modified parts of the model.
Here are a few more references besides the ones in the text above:
How convert simulink files to XML
http://www.scootersoftware.com/vbulletin/showthread.php?t=11568
http://blog.xogeny.com/blog/dont-zip/
http://blog.developpez.com/matlab/p11469/simulink-2/nouveau-format-slx-pour-les-modeles-simulink
Update (2015/04/02)
The new version of the Simulink Library for Java has full SLX format support. The documentation is not explicit, but the source code contains all details for parsing it.
Old answer
As answered by rob, the Simulink Library for Java supports Simulink's MDL file format and also can parse the Stateflow content. The library is Open Source, but the only documentation is the source code.
We are currently (as of September 2014) working on SLX support and expect to release this in the next 1 or 2 months. If you need the code before this time, feel free to contact me.
It is true that when using the library, your code may possibly break with a new Simulink release, as the file format is not documented and we had to reverse engineer most of it. However, we are currently actively updating the library in case of problems and with the source code you might be able to fix it even if we are not around.
PS: I would have posted this as a comment to rob's answer, but it seems I do not have sufficient reputation to do so :(
Disclosure: I am one of the developers of the mentioned library.
What information is available about these file formats?
MathWorks does have some documentation for the MDL file format in R2007b.
SLX files are zipfile containers whose internal structure is based on OOXML's OPC format. The SLX files contain one or more XML files whose internal structure is similar to that of an MDL file, but in XML format. In addition, binary resources such as graphics may be stored in separate JPG files rather than being text-encoded and directly embedded as they are in an MDL file.
Both formats change as new features are added to Simulink, but you can expect SLX to be less stable as MathWorks refactors SLX's internal file structure. For example, in R2014b, MathWorks has started breaking sections of the traditionally monolithic blockdiagram.xml out into separate files such as stateflow.xml and graphicalInterface.xml.
What tools are available for parsing these files?
There are a few publicly-available libraries/APIs for parsing Simulink, but I haven't used any of them so I'm not sure how well they work.
Simulink Library for Java (formerly called ConQAT) (Java) - MDL, SLX
TSMP - Tiny Simulink Model Parser (.NET) - MDL
Simulink-Model-Parsing-Tools (Python) - MDL
You may also be able to find others by searching for Simulink parser.
If none of those do the trick, some commercial tools parse MDL and SLX directly rather than relying on the MATLAB API. You could possibly inquire about licensing the parser used in some commercially-available Simulink tool.

Extract .mat data without matlab - tried scilab unsuccessfully

I've downloaded a data set that I am interested in. However, it is in .mat format and I do not have access to Matlab.
I've done some googling and it says I can open it in SciLab.
I tried a few things, but I haven't found any good tutorials on this.
I did
fd = matfile_open("file.mat")
matfile_listvar(fd)
and that prints out the filename without the extension. I tried
var1 = matfile_varreadnext(fd)
and that just gives me "var1 = "
I don't really know how the data is organized. The repository described the data it contains, but not how it is organized.
So, my question is, what am I doing wrong in extracting/viewing this data? I'm not committed to SciLab, if there is a better tool for this I am open to that.
One options is to use Octave, which can read .mat files and run most Matlab .m files. Octave is open source with binaries available for Linux, Mac, and Windows. Inside of Octave you can load the file using:
load file
See Octave's manual section 14.1.3 Simple File I/O for more details.
In Scilab:
loadmatfile('file.mat');
(Source)
I had this same interest a few years back. I used this question as a guide. It uses Python and SciPy. There are options for NumPy and hd5f as well. Another option is to write your own reader for the .mat format in whatever language you need. Here is the link to the mat file format definition.

Is there any way to read MATLAB's .mat files in Perl?

I have some data generated in MATLAB that I want to process using Perl. I saved the data from MATLAB in a .mat file. Is there any way to read it in Perl?
One option would be to save the binary MAT file as ASCII from inside MATLAB using something like:
load('test_data.mat');
save('test_data.asc', 'var1', 'var2', '-ascii');
Then you would have ASCII data to process in Perl.
If you need a solution completely written in Perl, then you should be able to automate the process using the Math::MATLAB package on CPAN.
NOTE: If Python is an option, you could use the loadmat function in the SciPy Python library.
The Java library JMatIO has worked well for me. Maybe you can try using inline Java.