How to remove a Bookmark from a document? - openxml

Because OOXML documents don't seem to follow proper XML rules, a Bookmark consists of a BookmarkStart, a BookmarkEnd and an arbitrary number of elements in-between; not a hierarchy but a single flow of elements which have to be traversed in the right order:
<w:bookmarkStart w:id="4" w:name="Author"/>
<w:r w:rsidR="009878B3"><w:rPr><w:sz w:val="28"/></w:rPr><w:t><</w:t></w:r>
<w:r w:rsidR="005E0909"><w:rPr><w:sz w:val="28"/></w:rPr><w:t xml:space="preserve"> </w:t></w:r>
<w:r w:rsidR="009878B3"><w:rPr><w:sz w:val="28"/></w:rPr><w:t>Author></w:t></w:r>
<w:bookmarkEnd w:id="4"/>
I already ran into this problem in a related question: https://stackoverflow.com/questions/28219201/how-to-get-the-text-of-a-bookmark-as-a-single-string
But this question is, how can I remove the Bookmark entirely from the document, without breaking anything? Do I have to iterate through siblings from the BookmarkStart until I reach a BookmarkEnd? Is there some useful API method which makes up for the failure to use XML properly whereby one would just have a Bookmark node which could be deleted?!

You can just remove the BookmarkStart via the API and it will remove the corresponding BookmarkEnd for you and leave any contents intact. In C# something like this should work:
public static void RemoveBookmark(string filename, string bookmarkName)
{
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(filename, true))
{
Body body = wordDocument.MainDocumentPart.Document.Body;
//find a matching BookmarkStart based on name
BookmarkStart start = body.Descendants<BookmarkStart>().FirstOrDefault(b => b.Name == bookmarkName);
if (start == null)
{
throw new Exception(string.Format("Bookmark {0} not found", bookmarkName));
}
//this is clever enough to remove the BookmarkStart and BookmarkEnd
start.Remove();
wordDocument.MainDocumentPart.Document.Save();
}
}

Related

How to edit pasted content using the Open XML SDK

I have a custom template in which I'd like to control (as best I can) the types of content that can exist in a document. To that end, I disable controls, and I also intercept pastes to remove some of those content types, e.g. charts. I am aware that this content can also be drag-and-dropped, so I also check for it later, but I'd prefer to stop or warn the user as soon as possible.
I have tried a few strategies:
RTF manipulation
Open XML manipulation
RTF manipulation is so far working fairly well, but I'd really prefer to use Open XML as I expect it to be more useful in the future. I just can't get it working.
Open XML Manipulation
The wonderfully-undocumented (as far as I can tell) "Embed Source" appears to contain a compound document object, which I can use to modify the copied content using the Open XML SDK. But I have been unable to put the modified content back into an object that lets it be pasted correctly.
The modification part seems to work fine. I can see, if I save the modified content to a temporary .docx file, that the changes are being made correctly. It's the return to the clipboard that seems to be giving me trouble.
I have tried assigning just the Embed Source object back to the clipboard (so that the other types such as RTF get wiped out), and in this case nothing at all gets pasted. I've also tried re-assigning the Embed Source object back to the clipboard's data object, so that the remaining data types are still there (but with mismatched content, probably), which results in an empty embedded document getting pasted.
Here's a sample of what I'm doing with Open XML:
using OpenMcdf;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
...
object dataObj = Forms.Clipboard.GetDataObject();
object embedSrcObj = dateObj.GetData("Embed Source");
if (embedSrcObj is Stream)
{
// read it with OpenMCDF
Stream stream = embedSrcObj as Stream;
CompoundFile cf = new CompoundFile(stream);
CFStream cfs = cf.RootStorage.GetStream("package");
byte[] bytes = cfs.GetData();
string savedDoc = Path.GetTempFileName() + ".docx";
File.WriteAllBytes(savedDoc, bytes);
// And then use the OpenXML SDK to read/edit the document:
using (WordprocessingDocument openDoc = WordprocessingDocument.Open(savedDoc, true))
{
OpenXmlElement body = openDoc.MainDocumentPart.RootElement.ChildElements[0];
foreach (OpenXmlElement ele in body.ChildElements)
{
if (ele is Paragraph)
{
Paragraph para = (Paragraph)ele;
if (para.ParagraphProperties != null && para.ParagraphProperties.ParagraphStyleId != null)
{
string styleName = para.ParagraphProperties.ParagraphStyleId.Val;
Run run = para.LastChild as Run; // I know I'm assuming things here but it's sufficient for a test case
run.RunProperties = new RunProperties();
run.RunProperties.AppendChild(new DocumentFormat.OpenXml.Wordprocessing.Text("test"));
}
}
// etc.
}
openDoc.MainDocumentPart.Document.Save(); // I think this is redundant in later versions than what I'm using
}
// repackage the document
bytes = File.ReadAllBytes(savedDoc);
cf.RootStorage.Delete("Package");
cfs = cf.RootStorage.AddStream("Package");
cfs.Append(bytes);
MemoryStream ms = new MemoryStream();
cf.Save(ms);
ms.Position = 0;
dataObj.SetData("Embed Source", ms);
// or,
// Clipboard.SetData("Embed Source", ms);
}
Question
What am I doing wrong? Is this just a bad/unworkable approach?

Sequence of Project Deployment in Mule

Does anybody know in what sequence, the mule project are loaded when the Mule starts-up?
It doesn't seem to be in alphabetical order or last updated time.
If you look at the class MuleDeploymentService, you can see the following:
String appString = (String) options.get("app");
if (appString == null)
{
String[] explodedApps = appsDir.list(DirectoryFileFilter.DIRECTORY);
String[] packagedApps = appsDir.list(ZIP_APPS_FILTER);
deployPackedApps(packagedApps);
deployExplodedApps(explodedApps);
}
else
{
String[] apps = appString.split(":");
Description for method File.list states that "there is no guarantee that the name strings in the resulting array will appear in any specific order". So, I guess the answer is in no particular order, or in the order they are listed in using the -app option.

Converting programmatically created openXML word documents to HTML errors with nulls

I'm creating WordProcessingDocuments using openxml (which works fine and the produced word doc is exactly what I want), now I'm trying to convert these newly created docs to HTML using the openxml Powertools. I'm new to this so I'm hoping thats it's something stupid that I'm missing but was hoping someone could point me in the right direction with these nullable errors I'm receiving.
This is the exact error...
System.NullReferenceException: Object reference not set to an instance of an object.
at OpenXmlPowerTools.HtmlConverter.ConvertToHtmlTransform(WordprocessingDocument wordDoc, HtmlConverterSettings settings, XNode node, Func2 imageHandler)
at OpenXmlPowerTools.HtmlConverter.<>c__DisplayClass37.<ConvertToHtmlTransform>b__1d(XElement e)
at System.Linq.Enumerable.WhereSelectEnumerableIterator2.MoveNext()
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XElement..ctor(XName name, Object content)
at OpenXmlPowerTools.HtmlConverter.ConvertToHtmlTransform(WordprocessingDocument wordDoc, HtmlConverterSettings settings, XNode node, Func2 imageHandler)
at OpenXmlPowerTools.HtmlConverter.<>c__DisplayClass37.<ConvertToHtmlTransform>b__1c(XElement e)
at System.Linq.Enumerable.WhereSelectEnumerableIterator2.MoveNext()
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XElement..ctor(XName name, Object[] content)
at OpenXmlPowerTools.HtmlConverter.ConvertToHtmlTransform(WordprocessingDocument wordDoc, HtmlConverterSettings settings, XNode node, Func`2 imageHandler)
I'm using the exact same code you can find on Eric Whites blog.
public static void PrintHTML(string file)
{
byte[] byteArray = File.ReadAllBytes(file);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc =
WordprocessingDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
//PageTitle = "some title"
};
XElement html = HtmlConverter.ConvertToHtml(doc, settings);
File.WriteAllText(#"C:\\Temp\Test.html", html.ToStringNewLineOnAttributes());
}
}
}
I know the code works because if i pass it a normal worddoc that I haven't created it works fine and converts to html fine. If i create a word doc using openxml then manually copy the contents into a new word file, save it, then pass it through the conversion code, that will work as well. So I'm thinking it must be something to do with the way I'm createing the word doc in openxml initially. Maybe im not adding a part to the file that is required.
Using the openxml sdk I have compared a working and non working file and they appear to have the same components/parts.
From the errors I've posted does anyone have any ideas of where the problem could be, ie, what is null? I can post the creation code for the word doc but it's quite extensive and it might just confuse people more.
I finally got to the bottom of this. I had to dig out the source code for the HtmlConverter in the openxmlpower tools, after some debuging I found that this line in the code was erroring...
line 371
styleId = (string)wordDoc.MainDocumentPart.StyleDefinitionsPart
.GetXDocument().Root.Elements(W.style)
.Where(e => (string)e.Attribute(W.type) == "paragraph" &&
(string)e.Attribute(W._default) == "1")
.FirstOrDefault().Attributes(W.styleId).FirstOrDefault();
basically in my debugging the
(string)e.Attribute(W._default)
was returning as True or False
so i changed the following line
.Where(e => (string)e.Attribute(W.type) == "paragraph" &&
(string)e.Attribute(W._default) == "1")
to
.Where(e => (string)e.Attribute(W.type) == "paragraph" && (
(string)e.Attribute(W._default) == "1" || (string)e.Attribute(W._default) == "true"))
and now works as expected
Had the same issue where I was saving a reportbuilder report to OpenWordXML and could not convert the bytes to html.
Had to add the following line of code for it to work correctly with version 2.8.1.0
private static IEnumerable<XElement> ParaStyleParaPropsStack(XDocument stylesXDoc,
string paraStyleName, XElement para)
{
if (stylesXDoc == null)
yield break;
var localParaStyleName = paraStyleName;
while (localParaStyleName != null)
{
XElement paraStyle = stylesXDoc.Root.Elements(W.style).FirstOrDefault(s
=>
**s.Attribute(W.type) != null &&**
s.Attribute(W.type).Value == "paragraph" &&
s.Attribute(W.styleId).Value == localParaStyleName);
s.Attribute(W.type) != null && // the liner that was added

How do I unlock a content control using the OpenXML SDK in a Word 2010 document?

I am manipulating a Word 2010 document on the server-side and some of the content controls in the document have the following Locking properties checked
Content control cannot be deleted
Contents cannot be edited
Can anyone advise set these Locking options to false or remove then altogether using the OpenXML SDK?
The openxml SDK provides the Lock class and the LockingValues enumeration
for programmatically setting the options:
Content control cannot be deleted and
Contents cannot be edited
So, to set those two options to "false" (LockingValues.Unlocked),
search for all SdtElement elements in the document and set the Val property to
LockingValues.Unlocked.
The code below shows an example:
static void UnlockAllSdtContentElements()
{
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(#"c:\temp\myword.docx", true))
{
IEnumerable<SdtElement> elements =
wordDoc.MainDocumentPart.Document.Descendants<SdtElement>();
foreach (SdtElement elem in elements)
{
if (elem.SdtProperties != null)
{
Lock l = elem.SdtProperties.ChildElements.First<Lock>();
if (l == null)
{
continue;
}
if (l.Val == LockingValues.SdtContentLocked)
{
Console.Out.WriteLine("Unlock content element...");
l.Val = LockingValues.Unlocked;
}
}
}
}
}
static void Main(string[] args)
{
UnlockAllSdtContentElements();
}
Just for the ones who copy this code, keep in mind that if there is no Locks associated to the content control, then there won't be a Lock property associated to it, so when the code executes the following instruction, it will return a exception since there is no element found:
Lock l = elem.SdtProperties.ChildElements.First<Lock>();
The way to fix this is do the FirstOrDefault instead of First.

merge word documents to a single document

I used the code in the link mentioned below to merge word files into a single file
http://devpinoy.org/blogs/keithrull/archive/2007/06/09/updated-how-to-merge-multiple-microsoft-word-documents.aspx
However, seeing the output file i realized that it was unable to copy header image in the first document. How do we merge documents preserving format and content.
I will suggest to use GroupDocs.Merger Cloud for merging multiple word document to a single word document, it keeps the formatting and contents of the source documents. It is a platform independent REST API solution without depending on any third-party tool or software.
Sample C# code:
var configuration = new GroupDocs.Merger.Cloud.Sdk.Client.Configuration(MyAppSid, MyAppKey);
var apiInstance_Document = new GroupDocs.Merger.Cloud.Sdk.Api.DocumentApi(configuration);
var apiInstance_File = new GroupDocs.Merger.Cloud.Sdk.Api.FileApi(configuration);
var pathToSourceFiles = #"C:/Temp/input/";
var remoteFolder = "Temp/";
var joinItem_list = new List<JoinItem>();
try
{
DirectoryInfo dir = new DirectoryInfo(pathToSourceFiles);
System.IO.FileInfo[] files = dir.GetFiles();
foreach (System.IO.FileInfo file in files)
{
var request_upload = new GroupDocs.Merger.Cloud.Sdk.Model.Requests.UploadFileRequest(remoteFolder + file.Name, File.Open(file.FullName, FileMode.Open));
var response_upload = apiInstance_File.UploadFile(request_upload);
var item = new JoinItem
{
FileInfo = new GroupDocs.Merger.Cloud.Sdk.Model.FileInfo
{ FilePath = remoteFolder + file.Name }
};
joinItem_list.Add(item);
}
var options = new JoinOptions
{
JoinItems = joinItem_list,
OutputPath = remoteFolder + "Merged_Document.docx"
};
var request = new JoinRequest(options);
var response = apiInstance_Document.Join(request);
Console.WriteLine("Output file path: " + response.Path);
}
catch (Exception e)
{
Console.WriteLine("Exception while Merging Documents: " + e.Message);
}
That code is inserting a page break after each file.
Since sections control headers, if a second or subsequent document has a header, you'll probably be wanting to keep the original section properties, and insert those after your first document.
If you look at your original document as a docx, you'll probably see that your section is a document level section properties element.
The easiest way around your problem may be to create a second section properties element inside the last paragraph (which contains the header information). Then this should just stay there when the documents are merged (ie other paragraphs added after it).
That's the theory. See also http://www.pcreview.co.uk/forums/thread-898133.php
But I haven't tried it; it assumes InsertFile behaves as I expect it should.