How to modify a DocX file and save to a different location with OpenXML SDK?

How to modify a DocX file and save to a different location with OpenXML SDK? - openxml

I want to use OpenXML SDK 2.0 to do the following:
Open A.docx
Modify the document
Save the modified document as B.docx
A & B would be parameters to a method and they could be the same. Assuming they are not the same, A should not be modified at all.
I cannot see a "SaveAs" method, in fact `WordprocessingDocument" class doesn't really seem to support concept of file location.
How should I do this?

I use a memory stream and pass it to the WordprocessingDocument.Open method. After I'm done changing the document, I just write the bytes to the destination:
var source = File.ReadAllBytes(filename);
using (var ms = new MemoryStream()) {
ms.Write(source, 0, source.Length);
/* settings defined elsewhere */
using (var doc = WordprocessingDocument.Open(ms, true, settings)) {
/* do something to the doc */
}
/* used in File.WriteAllBytes elsewhere */
return ms.ToArray();
}

+1 on the answer already given...
Here is an MSDN article that discusses working with in-memory Open XML documents. I think that you will find it relevant.
http://msdn.microsoft.com/en-us/library/office/ee945362.aspx

Related

WordprocessingDocument.CreateFromTemplate method creates corrupted MS Word files

I have proper .dotm template.
When I create a new file based on a template by double clicking in explorer it creates the correct file (based on this template). Created file size after save is 16Kb (without any content).
But if I want to use .CreateFromTemplate method in my code I cannot open a newly created .docx file in MS Word.
New file size is 207Kb (just like .dotm file). MS Word display "run-time error 5398" and not open the file.
I'm using nuget package DocumentFormat.OpenXml 2.19.0, Word 365 version 16.0.14931.20648 - 32bit and code like this:
using (WordprocessingDocument doc = WordprocessingDocument.CreateFromTemplate(templatePath))
{
doc.SaveAs(newFileName);
}
Google is silent about this error, ChatGPT says that:
The "Run-time Error 5398" error means that the file you are trying to open is corrupted or not a valid docx file. Possible reasons for this error may be the following:
The file was not saved correctly after making changes. Verify that the Save() method was called after making changes to the file.
The file was saved with the wrong extension, e.g. as DOTM instead of DOCX
The file was saved in an invalid format.
There may have been some unhandled exceptions in your code.
When I manually change the extension of a new file from docx to dotm, there is no error when opening, but the file does not open.
What am I doing wrong with CreateFromTemplate method?

I tried to reproduce the behavior you described, using the following unit tests:
public sealed class CreateFromTemplateTests
{
private readonly ITestOutputHelper _output;
public CreateFromTemplateTests(ITestOutputHelper output)
{
_output = output;
}
[Theory]
[InlineData("c:\\temp\\MacroEnabledTemplate.dotm", "c:\\temp\\MacroEnabledDocument.docm")]
[InlineData("c:\\temp\\Template.dotx", "c:\\temp\\Document.docx")]
public void CanCreateDocmFromDotm(string templatePath, string documentPath)
{
// Let's not attach the template, which is done by default. If a template is attached, the validator complains as follows:
// The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:attachedTemplate'.
using (var wordDocument = WordprocessingDocument.CreateFromTemplate(templatePath, false))
{
// Validate the document as created with CreateFromTemplate.
ValidateOpenXmlPackage(wordDocument);
// Save that document to disk so we can open it with Word, for example.
wordDocument.SaveAs(documentPath).Dispose();
}
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(documentPath, true))
{
// Validate the document that was opened from disk, just to see what Word would open.
ValidateOpenXmlPackage(wordDocument);
}
}
private void ValidateOpenXmlPackage(OpenXmlPackage openXmlPackage)
{
OpenXmlValidator validator = new(FileFormatVersions.Office2019);
List<ValidationErrorInfo> validationErrors = validator.Validate(openXmlPackage).ToList();
foreach (ValidationErrorInfo validationError in validationErrors)
{
_output.WriteLine(validationError.Description);
}
if (validationErrors.Any())
{
// Note that Word will most often be able to open the document even if there are validation errors.
throw new Exception("The validator found validation errors.");
}
}
}
In both tests, the documents are created without an issue. Looking at the Open XML markup, both documents look fine. However, while I don't get any runtime error, Word also does not open the macro-enabled document.
I am not sure why that happens. It might be related to your security settings.
Depending on whether or not you really need to use CreateFromTemplate(), you could create a .docm (rather than a .dotm) and create new macro-enabled documents by copying that .docm.
I opened an issue in the Open XML SDK project on GitHub.

VSCode: obtain editor content in Language Server

I'm trying to develop a Language Server to a new language in VS Code and I'm using the Microsoft sample as reference (https://github.com/microsoft/vscode-extension-samples/tree/master/lsp-sample).
In their sample the autocompletion is done in this chunk of code:
connection.onCompletion(
(_textDocumentPosition: TextDocumentPositionParams): CompletionItem[] => {
// The pass parameter contains the position of the text document in
// which code complete got requested. For the example we ignore this
// info and always provide the same completion items.
return [
{
label: 'TypeScript',
kind: CompletionItemKind.Text,
data: 1
},
{
label: 'JavaScript',
kind: CompletionItemKind.Text,
data: 2
}
];
}
);
As the comment says, it's a dumb autocompletion system, since it always provides the same suggestions.
I can see that there's an input parameter of type TextDocumentPositionParams and this type has the following interface:
export interface TextDocumentPositionParams {
/**
* The text document.
*/
textDocument: TextDocumentIdentifier;
/**
* The position inside the text document.
*/
position: Position;
}
It has the cursor position and a TextDocumentIdentifier but the last only has a uri property.
I want to create an intelligent autocomplete system, based on the type of the object of the word in the cursor position.
This sample is very limited and I'm kinda lost here. I guess I could read the file in the uri property and based on the cursor position I could figure out which items I should suggest. But how about when the file is not saved? If I read the file I would read the data that is on disk, and not what is currently shown in the editor.
What's the best approach to do that?

The Language Server Protocol supports text synchronization, see TextDocumentSyncOptions in ServerCapabilities and the corresponding methods (textDocument/didChange, didChange, didClose...). A Language Server will usually keep a copy of all open documents in memory.
The sample you linked actually makes use of this, but the synchronization itself is abstracted away into the TextDocuments class from vscode-languageserver. As a result server.ts doesn't have to do much more than this:
let documents: TextDocuments = new TextDocuments();
[...]
documents.listen(connection);
You can then simply use documents.get(uri).getText() to obtain the text shown in the editor.

How to create pdf package using PdfBox?

I was migrating some code (originally using iText) to use PdfBox for PDF merging. All went fine except creating PDF packages or portfolios. I have to admit I was not aware that this existed until now.
this is a snippet of my code (using iText):
PdfStamper stamper = new PdfStamper(reader, out);
stamper.makePackage(PdfName.T);
stamper.close();
I need this but with PdfBox.
I'm looking into API and docs for both and I can not find a solution atm. Any help would be great.
PS. Sorry if I made impression that I need solution in iText, I need it in PdfBox because migration is going from iText to PdfBox.

As far as I know PDFBox does not contain a single, dedicated method for that task. On the other hand it is fairly easy to use existing generic PDFBox methods to implement it.
First of all, the task is effectively defined to do the equivalent to
stamper.makePackage(PdfName.T);
using PDFBox. That method in iText is documented as:
/**
* This is the most simple way to change a PDF into a
* portable collection. Choose one of the following names:
* <ul>
* <li>PdfName.D (detailed view)
* <li>PdfName.T (tiled view)
* <li>PdfName.H (hidden)
* </ul>
* Pass this name as a parameter and your PDF will be
* a portable collection with all the embedded and
* attached files as entries.
* #param initialView can be PdfName.D, PdfName.T or PdfName.H
*/
public void makePackage( final PdfName initialView )
Thus, we need to change a PDF (fairly minimally) to make it a portable collection with a tiled view.
According to section 12.3.5 "Collections" of ISO 32000-1 (I don't have part two yet) this means we have to add a Collection dictionary to the PDF catalog with a View entry with value T. Thus:
PDDocument pdDocument = PDDocument.load(...);
COSDictionary collectionDictionary = new COSDictionary();
collectionDictionary.setName(COSName.TYPE, "Collection");
collectionDictionary.setName("View", "T");
PDDocumentCatalog catalog = pdDocument.getDocumentCatalog();
catalog.getCOSObject().setItem("Collection", collectionDictionary);
pdDocument.save(...);

Combine two PDF-a documents using ITextSharp

hoping that someone can see the flaw in my code to merge to PDF-a documents using ITextSharp. Currently it complains about missing metadata which PDF-a requires.
Document document = new Document();
MemoryStream ms = new MemoryStream();
using (PdfACopy pdfaCopy = new PdfACopy(document, ms, PdfAConformanceLevel.PDF_A_1A))
{
document.Open();
using (PdfReader reader = new PdfReader("Doc1.pdf"))
{
pdfaCopy.AddDocument(reader);
}
using (PdfReader reader = new PdfReader("doc2.pdf"))
{
pdfaCopy.AddDocument(reader);
}
}
The exact error received is
Unhandled Exception: iTextSharp.text.pdf.PdfAConformanceException: The document catalog dictionary of a PDF/A conforming file shall contain
the Metadata key
I was hoping that the 'document catalog dictionary' would be copied as well, but I guess the 'new Document()' creates an empty non-conforming document or something.
Thanks! Hope you can help
Wouter

You need to add this line:
copy.CreateXmpMetadata();
This will create some default XMP metadata. Of course: if you want to create your own XMP file containing info about the documents you're about to merge, you can also use:
copy.XmpMetadata = myMetaData;
where myMetaData is a byte array containing a correct XMP stream.
I hope you understand that iText can't automatically create the correct metadata. Providing metadata is something that needs human attention.

Check for Valid Image using getFilesAsync()

Using WinJS, while looping through a directory, how to retrieve only images in that particular directory and ignoring any other file extension, including the DoubleDots .. and the SingleDot . etc?
Something like:
var dir = Windows.Storage.KnownFolders.picturesLibrary;
dir.getFilesAsync().done(function (filesFound) {
for(var i=0; i < filesFound.length; i++){}
if(filesFound[i] IS_REALLY_AN_IMAGE_(jpeg,jpg,png,gif Only)){
//Retrieve it now!
}else{
//Escape it.
}
}})

Instead of trying to process pathnames, it will work much better to use a file query, which lets the file system do the search/filtering for you. A query also allows you to listen for the query's contentschanged event if you want to dynamically track the folder contents rather than explicitly enumerating again.
A query is created via StorageFolder.createFileQuery, createFolderQuery, or other variants. In your particular case, where you want to filter by file types, you can use createFileQueryWithOptions. This function takes a QueryOptions object which you can initialize with an array of file types. For example:
var picturesLibrary = Windows.Storage.KnownFolders.picturesLibrary;
var options = new Windows.Storage.Search.QueryOptions(
Windows.Storage.Search.CommonFileQuery.orderByName, [".jpg", ".jpeg", ".png", ".gif"]);
//Could also use orderByDate instead of orderByName
if (picturesLibrary.areQueryOptionsSupported(options)) {
var query = picturesLibrary.createFileQueryWithOptions(options);
showResults(query.getFilesAsync());
}
where showResults is some function that takes the promise from query.getFilesAsync and iterates as needed.
I go into this subject at length in Chapter 11 of my free ebook, Programming Windows Store Apps with HTML, CSS, and JavaScript, 2nd Edition, in the section "Folders and Folder Queries". Also refer to the Programmatic file search sample, as I do in the book.
When you want to display the image files, be sure to use thumbnails instead of loading the whole image (images are typically much larger than a display). That is, for each StorageFile, call its getThumbnailAsync or getScaledImageAsThumbnailAsync method. Pass the resulting thumbnail (blob) to URL.createObjectURL which returns a URL you can assign to an img.src attribute. Or you can use a WinJS.UI.ListView control, but that's another topic altogether (see Chapter 7 of my book).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse