Convert MathType equation embedded in OLE Binary file to MathML - ms-word

I am trying to convert MathType's equation which is stored as OLE binary file to MathML using MathType's SDK.
The input file for my program is a DocX which would contain embdedd MathType equations. I am looking for a solution thats independent of using MS Word. DocX is a zip file, and once it is extracted we can find the a binary file for each OLE object in the folder "word/embeddings/". Typically the file name would be oleObject1.bin, oleObject2.bin etc.
When I checked with MathType SDK it has a class "ConvertEquation" which has following method:
virtual public bool Convert(EquationInput ei, EquationOutput eo)
EquationInput is an abstract class for which following concrete classes are made available:
EquationInputFileText
EquationInputFileWMF2
EquationInputFileWMF
EquationInputFileGIF
EquationInputFileEPS
In the above listed classes none of them seems to support OLE binary.
According to MathType's SDK doc, MTEF data is saved as the native data format of the object. Whenever an equation object is to be written to an OLE "stream", a 28- byte header is written, followed by the MTEF data. I guess this is exactly what is present in this binary file. But just that there seems to be no way by which this format can be made to be used by SDK to convert it into MathML. Any thoughts?
Thanks

you can convert mathtype wmf file to mathml as follow:
ConvertEquation conv = new ConvertEquation();
var input = EquationInputFileWMF("mathTYpe.wmf");
var output = EquationOutputFileText("MathMLName.txt", "MathML2 (m namespace).tdl"));
conv.Convert(input , input);
the "MathML2 (m namespace).tdl" string stand for "tdl" file which contains in "MathType\Translators" path, if you open the Translators path ,you can find many of type.

You may try MathMagic equation editor (Windows version).
MathMagic can extract all Word embeded equations out of the document(s) (.doc or .docx), and can save/covert them to other format (such as JPG, PNG, BMP, PDF, TeX, LaTeX, MathML, ...) as a batch conversion job.
Unfortunately, their trial version does not support this batch conversion. A valid license (even 1-month or 2-month license) is required to enable the Conversion feature.

Related

How to get the extension file of Input Stream

I have a code
var test = Base64.getDecoder.decode(base64);
ByteArrayInputStream(test);
var input_stream = new ByteArrayInputStream(test);
Logger.debug(test.getClass.getSimpleName)
How do I get the file extension of the variable input_stream?
I believe you are asking how to determine the image file format from its bytes.
This has already been answered here: Java get image extension/type using BufferedImage from URL
I would not use the term "file extension" to refer to the file format -- while many systems (including Windows) conventionally use the file name extension to indicate the file format, you cannot always rely on this convention, and these are two separate concepts. I found the above question by Googling "java how to detect image type"
Good luck!

Read Biopax file in matlab

How can we read Biopax file in matlab, biopax is special type of xml file. How we can retrieve information from it using matlab.
There is a BioConductor package for reading and manipulating BioPAX files in R (http://bioconductor.org/packages/release/bioc/html/paxtoolsr.html).
Check out the related tutorial on getting set up in R:
https://www.youtube.com/channel/UCWSbcyynroIp-f6O3sfz7jg
AND using PaxtoolsR:
http://bioconductor.org/packages/release/bioc/vignettes/paxtoolsr/inst/doc/using_paxtoolsr.html

can open xml sdk be used in creating xml files?

is it possible to use the OPEN XML SDK and generate an xml file that contains some metadata of a particular docx file?
details: i have a docx file, from which i want to extract some metadata(using open xml) and display them as xml file and later use Jquery to present them in a more readable form.
You can use the SDK to extract info from the various properties parts which may be present in the docx (for example, the core properties part, which included dublin core type info).
You can extract it in its native XML form:
<cp:coreProperties
xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core- properties"
xmlns:dc="http://purl.org/dc/elements/1.1/" .. >
<dc:creator>Joe</dc:creator>
<cp:lastModifiedBy>Joe</cp:lastModifiedBy>
<cp:revision>1</cp:revision>
<dcterms:created xsi:type="dcterms:W3CDTF">2010-11-10T00:32:00Z</dcterms:created>
<dcterms:modified xsi:type="dcterms:W3CDTF">2010-11-10T00:33:00Z</dcterms:modified>
</cp:coreProperties>
or, in some other XML dialect of your own choosing.
I know question was posted a long time ago, but first result of google search sent me here. So if there are others looking for a solution to this, there is a snippet on MSDN website https://msdn.microsoft.com/en-us/library/office/cc489219.aspx
short answer is... using XmlTextWritter, and it applies to Office 2013 afaik:
// Add the CoreFilePropertiesPart part in the new word processing document.
var coreFilePropPart = wordDoc.AddCoreFilePropertiesPart();
using (XmlTextWriter writer = new XmlTextWriter(coreFilePropPart.GetStream(FileMode.Create), System.Text.Encoding.UTF8))
{
writer.WriteRaw("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<cp:coreProperties xmlns:cp=\"http://schemas.openxmlformats.org/package/2006/metadata/core-properties\"></cp:coreProperties>");
writer.Flush();
}

Do we have any Equivalent of Response.AppendHeader in windows application

I came around this technique of converting datatable to excel
http://www26.brinkster.com/mvark/dyna/downloadasexcel.html
Do we have any Equivalent of Response.AppendHeader in windows application in C#.
Regards
Hema
The trick in the code sample that you have mentioned to dynamically generate an Excel file is based on the fact that documents can be converted from Word/Excel to HTML (File->Save As) and vice versa. Essentially a HTML page containing Office XML is created & in a web application a file download is triggered with the help of the following Response.AppendHeader statements -
Response.AppendHeader("Content-Type", "application/vnd.ms-excel");
Response.AppendHeader("Content-disposition", "attachment; filename=my.xls");
If you want to use this technique in a Winforms application, just save the string content as a text file and give the file an extension of ".xls". Instead of the last 3 lines in the sample's Page_Load method, replace it with this line -
System.IO.File.WriteAllText(#"C:\Report.xls", strBody);
HTH

generic text reading

I am working on a project where I need to read some generic text...I am looking for any api by I can read generic text and also can convert it to .csv file...
Can any one plz help...
using java on windows os...
--------------------------MORE Detail---------------------------------------------------------------------------------------
let me clarify:
Assume I have a pdf document or for that matter any file type document. I intend to use Print to Generic text printer option and get the file in that format.Finally, I intend to use some API which shoudl enable me to programatically read this Generic Text Format file. I intend to extract text from this generic text file.
So, be it any file (.doc/.pdf/.xls etc wtatever), I intend to create a Generic Text Format file using print option. Then run my code to read those files and extract some information.
PS: Assume that I have a Status report form with standard fields. Ok. But, some people might submit in .pdf, some in .doc , some in text format. But, every document contains same fields, but probably with diferent layouts.
Now, I am looking for a generic solution, by which i shoudl be able to convert every file type in to generic text file format and then apply some logic to extract my Status report fields.
In Java this is more or less what you need to read a text file, assuming it's comma separated (just change the string in the "line.split" method if you need something else). It also skips the header.
public void parse(String filename) throws IOException {
File file = new File(filename);
FileInputStream fis = new FileInputStream(file);
InputStreamReader isr = new InputStreamReader(fis);
BufferedReader br = new BufferedReader(isr);
String line;
int header = 1;
while ((line = br.readLine()) != null) {
if (header == 1) {
header = 2;
continue; // skips header
}
String[] splitter = line.split(",");
// do whatever
System.out.println(splitter[0]);
}
}
CSV is a format for data in columns. It's not very useful for, say, a Wikipedia article.
The Apache Tika library will take all kinds of data and turn it into bland XML, from which you can make CSV as you like.
It would help if you would edit your question to clarify 'generic' versus' generated', and tell more about the data.
As for Windows printer drivers, are you looking to do something like 'print to pdf' as 'print to csv'? If so, I suspect that you need to start from MSDN samples of printer drivers and code this the hard way.
The so-called 'generic text file format' is not a structured format. It's completely unpredictable what you will find in there for any given input to the printer system.
A generic free book: Text Processing in Python
Just used the standard Java classes for I/O:
BufferedWriter, File, FileWriter, IOException, PrintWriter
.csv is simply a comma-separated values file. So just name your output file with a .csv extension.
You'll also need to figure out how you'd like to split your content.
Here are Java examples to get you going:
writing to a text file
how to read lines from a file