OpenXML SDK and MathML - openxml

I use MathML to create some data blocks and I need to insert it throught OpenXML SDK into docx file. I've heard it is possible, but I didn't manage it. Could somebody help me with this problem?

As far as I know, the OpenXml SDK does not support presentation MathML out of the box.
Instead, the OpenXml SDK supports Office MathML.
So, to insert presentation MathML into a word document we first have
to transform the presentation MathML into Office MathML.
Fortunately, Microsoft provides a XSL file (called MML2OMML.xsl) to transform presentation MathML
into Office MathML. The file MML2OMML.xsl is located under %ProgramFiles%\Microsoft Office\Office12.
In conjunction with the .Net Framework class
XslCompiledTransform we are able to transform presentation MathML into Office MathML.
The next step is to create a OfficeMath object from the transformed MathML.
The OfficeMath class represents a run containing WordprocessingML which shall be handled as though it was Office Open XML Math.
For more info please refer to MSDN.
The presentation MathML does not contain font information. To get a nice result
we must add font information to the created OfficeMath object.
In the last step we have to add the OfficeMath object to our word document.
In the example below I simply search for the first Paragraph in a
word document called template.docx and add the OfficeMath object to the found paragraph.
XslCompiledTransform xslTransform = new XslCompiledTransform();
// The MML2OMML.xsl file is located under
// %ProgramFiles%\Microsoft Office\Office12\
xslTransform.Load("MML2OMML.xsl");
// Load the file containing your MathML presentation markup.
using (XmlReader reader = XmlReader.Create(File.Open("mathML.xml", FileMode.Open)))
{
using (MemoryStream ms = new MemoryStream())
{
XmlWriterSettings settings = xslTransform.OutputSettings.Clone();
// Configure xml writer to omit xml declaration.
settings.ConformanceLevel = ConformanceLevel.Fragment;
settings.OmitXmlDeclaration = true;
XmlWriter xw = XmlWriter.Create(ms, settings);
// Transform our MathML to OfficeMathML
xslTransform.Transform(reader, xw);
ms.Seek(0, SeekOrigin.Begin);
StreamReader sr = new StreamReader(ms, Encoding.UTF8);
string officeML = sr.ReadToEnd();
Console.Out.WriteLine(officeML);
// Create a OfficeMath instance from the
// OfficeMathML xml.
DocumentFormat.OpenXml.Math.OfficeMath om =
new DocumentFormat.OpenXml.Math.OfficeMath(officeML);
// Add the OfficeMath instance to our
// word template.
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open("template.docx", true))
{
DocumentFormat.OpenXml.Wordprocessing.Paragraph par =
wordDoc.MainDocumentPart.Document.Body.Descendants<DocumentFormat.OpenXml.Wordprocessing.Paragraph>().FirstOrDefault();
foreach (var currentRun in om.Descendants<DocumentFormat.OpenXml.Math.Run>())
{
// Add font information to every run.
DocumentFormat.OpenXml.Wordprocessing.RunProperties runProperties2 =
new DocumentFormat.OpenXml.Wordprocessing.RunProperties();
RunFonts runFonts2 = new RunFonts() { Ascii = "Cambria Math", HighAnsi = "Cambria Math" };
runProperties2.Append(runFonts2);
currentRun.InsertAt(runProperties2, 0);
}
par.Append(om);
}
}
}

Related

How to fill an existing pdf form with itext 7 and itextsharp 5.5.9 without flattening it?

I am trying to fill out a USCIS form and after filling it is making as read only (flattening that). I am not sure why it is doing that. Even though I don’t have any code to flatten that.
I searched the stack overflow and tried many different things (with itextsharp 5.5.9 and itext 7) but still it doesn’t work.
Here is the sample code I am using
string src = #"https://www.uscis.gov/sites/default/files/files/form/i-90.pdf";
string dest = #"C:\temp\i-90Filled.pdf";
var reader = new PdfReader(src);
reader.SetUnethicalReading(true);
var writer = new PdfWriter(dest);
PdfDocument pdfDoc = new PdfDocument(reader, writer);
// add content
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdfDoc, true);
IDictionary<String, PdfFormField> fields = form.GetFormFields();
PdfFormField toSet;
fields.TryGetValue("form1[0].#subform[0].P1_Line3b_GivenName[0]", out toSet);
toSet.SetValue("John");
pdfDoc.Close();
Forms are filled like this with iTextSharp 5:
string src = #"https://www.uscis.gov/sites/default/files/files/form/i-90.pdf";
string dest = #"C:\temp\i-90Filled.pdf";
PdfReader pdfReader = new PdfReader(src);
PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(
dest, FileMode.Create));
AcroFields form = pdfStamper.AcroFields;
form.SetField("form1[0].#subform[0].P1_Line3b_GivenName[0]", "John");
Note that the form you are trying to fill is a hybrid form. It contains the description of the form twice: once as an AcroForm; once as an XFA form. You may want to remove the XFA form by adding this line:
form.RemoveXfa();
For more info, read the documentation:
Is it safe to remove XFA?
How to change the text color of an AcroForm field?
Your code is using iText 7 for C#. If you want that code to work, you most certainly need to remove the XFA part as iText 7 doesn't support XFA. iText 7 was developed with PDF 2.0 in mind (this spec is to be released in 2017). XFA will be deprecated in PDF 2.0. A valid PDF 2.0 file will not be allowed to contain an XFA stream.

How to edit pasted content using the Open XML SDK

I have a custom template in which I'd like to control (as best I can) the types of content that can exist in a document. To that end, I disable controls, and I also intercept pastes to remove some of those content types, e.g. charts. I am aware that this content can also be drag-and-dropped, so I also check for it later, but I'd prefer to stop or warn the user as soon as possible.
I have tried a few strategies:
RTF manipulation
Open XML manipulation
RTF manipulation is so far working fairly well, but I'd really prefer to use Open XML as I expect it to be more useful in the future. I just can't get it working.
Open XML Manipulation
The wonderfully-undocumented (as far as I can tell) "Embed Source" appears to contain a compound document object, which I can use to modify the copied content using the Open XML SDK. But I have been unable to put the modified content back into an object that lets it be pasted correctly.
The modification part seems to work fine. I can see, if I save the modified content to a temporary .docx file, that the changes are being made correctly. It's the return to the clipboard that seems to be giving me trouble.
I have tried assigning just the Embed Source object back to the clipboard (so that the other types such as RTF get wiped out), and in this case nothing at all gets pasted. I've also tried re-assigning the Embed Source object back to the clipboard's data object, so that the remaining data types are still there (but with mismatched content, probably), which results in an empty embedded document getting pasted.
Here's a sample of what I'm doing with Open XML:
using OpenMcdf;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
...
object dataObj = Forms.Clipboard.GetDataObject();
object embedSrcObj = dateObj.GetData("Embed Source");
if (embedSrcObj is Stream)
{
// read it with OpenMCDF
Stream stream = embedSrcObj as Stream;
CompoundFile cf = new CompoundFile(stream);
CFStream cfs = cf.RootStorage.GetStream("package");
byte[] bytes = cfs.GetData();
string savedDoc = Path.GetTempFileName() + ".docx";
File.WriteAllBytes(savedDoc, bytes);
// And then use the OpenXML SDK to read/edit the document:
using (WordprocessingDocument openDoc = WordprocessingDocument.Open(savedDoc, true))
{
OpenXmlElement body = openDoc.MainDocumentPart.RootElement.ChildElements[0];
foreach (OpenXmlElement ele in body.ChildElements)
{
if (ele is Paragraph)
{
Paragraph para = (Paragraph)ele;
if (para.ParagraphProperties != null && para.ParagraphProperties.ParagraphStyleId != null)
{
string styleName = para.ParagraphProperties.ParagraphStyleId.Val;
Run run = para.LastChild as Run; // I know I'm assuming things here but it's sufficient for a test case
run.RunProperties = new RunProperties();
run.RunProperties.AppendChild(new DocumentFormat.OpenXml.Wordprocessing.Text("test"));
}
}
// etc.
}
openDoc.MainDocumentPart.Document.Save(); // I think this is redundant in later versions than what I'm using
}
// repackage the document
bytes = File.ReadAllBytes(savedDoc);
cf.RootStorage.Delete("Package");
cfs = cf.RootStorage.AddStream("Package");
cfs.Append(bytes);
MemoryStream ms = new MemoryStream();
cf.Save(ms);
ms.Position = 0;
dataObj.SetData("Embed Source", ms);
// or,
// Clipboard.SetData("Embed Source", ms);
}
Question
What am I doing wrong? Is this just a bad/unworkable approach?

How can I set XFA data in a static XFA form in iTextSharp and get it to save?

I'm having a very strange issue with XFA Forms in iText / iTextSharp (iTextSharp 5.3.3 via NuGet). I am trying to fill out a static XFA styled form, however my changes are not taking.
I have both editions of iText in Action and have been consulting the second edition as well as the iTextSharp code sample conversions from the book.
Background: I have an XFA Form that I can fill out manually using Adobe Acrobat on my computer. Using iTextSharp I can read what the Xfa XML data is and see the structure of the data. I am essentially trying to mimic that with iText.
What the data looks like when I add data and save in Acrobat (note: this is only the specific section for datasets)
Here is the XML file I am trying to read in to replace the existing data (note: this is the entire contexts of that file):
However, when I pass the path to the replacement XML File in and try to set the data, the new file created (a copy of the original with the data replaced) without any errors being thrown, but the data is not being updated. I can see that the new file is created and I can open it, but there is no data in the file.
Here is the code being utilized to replace the data or populate for the first time, which is a variation of http://sourceforge.net/p/itextsharp/code/HEAD/tree/trunk/book/iTextExamplesWeb/iTextExamplesWeb/iTextInAction2Ed/Chapter08/XfaMovie.cs
public void Generate(string sourceFilePath, string destinationtFilePath, string replacementXmlFilePath)
{
PdfReader pdfReader = new PdfReader(sourceFilePath);
using (MemoryStream ms = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(pdfReader, ms))
{
XfaForm xfaForm = new XfaForm(pdfReader);
XmlDocument doc = new XmlDocument();
doc.Load(replacementXmlFilePath);
xfaForm.DomDocument = doc;
xfaForm.Changed = true;
XfaForm.SetXfa(xfaForm, stamper.Reader, stamper.Writer);
}
var bytes = ms.ToArray();
File.WriteAllBytes(destinationtFilePath, bytes);
}
}
Any help would be very much appreciated.
I found the issue. The replacement DomDocument needs to be the entire merged XML of the new document, not just the data or datasets portion.
I upvoted your answer, because it's not incorrect (I'm happy my reference to the demo led you to have another look at your code), but now that I have a second look at your original code, I think it's better to use the book example:
public byte[] ManipulatePdf(String src, String xml) {
PdfReader reader = new PdfReader(src);
using (MemoryStream ms = new MemoryStream()) {
using (PdfStamper stamper = new PdfStamper(reader, ms)) {
AcroFields form = stamper.AcroFields;
XfaForm xfa = form.Xfa;
xfa.FillXfaForm(XmlReader.Create(new StringReader(xml)));
}
return ms.ToArray();
}
}
As you can see, it's not necessary to replace the whole XFA XML. If you use the FillXfaForm method, the data is sufficient.
Note: for the C# version of the examples, see http://tinyurl.com/iiacsCH08 (change the 08 into a number from 01 to 16 for the examples of the other chapters).

OpenXML editing docx

I have a dynamically generated docx file.
Need write the text strictly to end of page.
With Microsoft.Interop i insert Paragraphs before text:
int kk = objDoc.ComputeStatistics(WdStatistic.wdStatisticPages, ref wMissing);
while (objDoc.ComputeStatistics(WdStatistic.wdStatisticPages, ref wMissing) != kk + 1)
{
objWord.Selection.TypeParagraph();
}
objWord.Selection.TypeBackspace();
But i can't use same code with Open XML, because pages.count calculated only by word.
Using interop impossible, because it so slowwwww.
There are 2 options of doing this in Open XML.
create Content Place holder from Microsoft Office Developer Tab at the end of your document and now you can access this Content Place Holder programatically and can place any text in it.
you can append text driectly to your word document where it will be inserted at the end of your text. In this approach you got to write all the stuff to your document first and once you are done than you can append your document the following way
//
public void WriteTextToWordDocument()
{
using(WordprocessingDocument doc = WordprocessingDocument.Open(documentPath, true))
{
MainDocumentPart mainPart = doc.MainDocumentPart;
Body body = mainPart.Document.Body;
Paragraph paragraph = new Paragraph();
Run run = new Run();
Text myText = new Text("Append this text at the end of the word document");
run.Append(myText);
paragraph.Append(run);
body.Append(paragraph);
// dont forget to save and close your document as in the following two lines
mainPart.Document.Save();
doc.Close();
}
}
I haven't tested the above code but hope it will give you an idea of dealing with word document in OpenXML.
Regards,

How to get text from textbox of MS word document using Apache POI?

I want to get information written in Textbox in an MS word document. I am using Apache POI to parse word document.
Currently I am iterating through all the Paragraph objects but this Paragraph list does not contain information from TextBox so I am missing this information in output.
e.g.
paragraph in plain text
**<some information in text box>**
one more paragraph in plain text
what i want to extract :
<para>paragraph in plain text</para>
<text_box>some information in text box</text_box>
<para>one more paragraph in plain text</para>
what I am getting currently :
paragraph in plain text
one more paragraph in plain text
Anyone knows how to extract information from text box using Apache POI?
This worked for me,
private void printContentsOfTextBox(XWPFParagraph paragraph) {
XmlObject[] textBoxObjects = paragraph.getCTP().selectPath("
declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main'
declare namespace wps='http://schemas.microsoft.com/office/word/2010/wordprocessingShape'
declare namespace v='urn:schemas-microsoft-com:vml'
.//*/wps:txbx/w:txbxContent | .//*/v:textbox/w:txbxContent");
for (int i =0; i < textBoxObjects.length; i++) {
XWPFParagraph embeddedPara = null;
try {
XmlObject[] paraObjects = textBoxObjects[i].
selectChildren(
new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "p"));
for (int j=0; j<paraObjects.length; j++) {
embeddedPara = new XWPFParagraph(
CTP.Factory.parse(paraObjects[j].xmlText()), paragraph.getBody());
//Here you have your paragraph;
System.out.println(embeddedPara.getText());
}
} catch (XmlException e) {
//handle
}
}
}
To extract all occurrences of text from Word .doc and .docx files for crgrep I used the Apache Tika source as a reference of how the Apache POI APIs should be correctly used. This is useful if you want to use POI directly and not depend on Tika.
For Word .docx files, take a look at this Tika class:
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator
if you ignore XHTMLContentHandler and formatting code you can see how to navigate a XWPFDocument correctly using POI.
For .doc files this class is helpful:
org.apache.tika.parser.microsoft.WordExtractor
both from the tika-parsers-1.x.jar. An easy way to access the Tika code through your maven dependencies is add Tika temporarily to your pom.xml such as
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>1.7</version>
</dependency>
let your IDE resolve attached source and step into the classes above.
If you want to get text from textbox in docx file (using POI 3.10-FINAL) here is sample code:
FileInputStream fileInputStream = new FileInputStream(inputFile);
XWPFDocument document = new XWPFDocument(OPCPackage.open(fileInputStream));
for (XWPFParagraph xwpfParagraph : document.getParagraphs()) {
String text = xwpfParagraph.getParagraphText(); //here is where you receive text from textbox
}
Or you can iterate over each
XWPFRun in XWPFParagraph and invoke toString() method. Same result.