How to properly close Word documents after Documents.Open - ms-word

I have the following code for a C# console app. It parses a Word document for textboxes and inserts the same text into the document at the textbox anchor point with markup. This is so I can convert to Markdown using pandoc, including textbox content which is not available due to https://github.com/jgm/pandoc/issues/3086. I can then replace my custom markup with markdown after conversion.
The console app is called in a PowerShell loop for all documents in a target list.
When I first run the Powershell script, all documents are opened and saved (with a new name) without error. But the next time I run it, I get an occasional popup error:
The last time you opened '' it caused a serious error. Do you still want to open it?
I can get through this by selecting yes on every popup, but this requires intervention and is tedious and slow. I want to know why this code results in this problem?
string path = args[0];
Console.WriteLine($"Parsing {path}");
Application word = new Application();
Document doc = word.Documents.Open(path);
try
{
foreach (Shape shp in doc.Shapes)
{
if (shp.TextFrame.HasText != 0)
{
string text = shp.TextFrame.TextRange.Text;
int page = shp.Anchor.Information[WdInformation.wdActiveEndPageNumber];
string summary = Regex.Replace(text, #"\r\n?|\n", " ");
Console.WriteLine($"++++textbox++++ Page {page}: {summary.Substring(0, Math.Min(summary.Length, 40))}");
string newtext = #$"{Environment.NewLine}TEXTBOX START%%%{text}%%%TEXTBOX END{Environment.NewLine}";
var range = shp.Anchor;
range.InsertBefore(Environment.NewLine);
range.Collapse();
range.Text = newtext;
range.set_Style(WdBuiltinStyle.wdStyleNormal);
}
}
string newFile = Path.GetFullPath(path) + ".notb.docx";
doc.SaveAs2(newFile);
}
finally
{
doc.Close();
word.Quit();
}

The console app is called in a PowerShell loop for all documents in a target list.
You can automate Word from your PowerShell script directly without involving any other dependencies. At least that will allow you to keep a single Word instance without creating each time a new Word Application instance for each document:
Application word = new Application();
Document doc = word.Documents.Open(path);
In the loop you could just open documents for processing and then closing them. It should improve the overall performance of your solution.
When you are done processing a document you need to close it by using the Close method which closes the specified document.
Also when a new Word Application instance is created, don't forget to close it as well by calling the Quit method which quits Microsoft Word and optionally saves or routes the open documents.
Application.Quit SaveChanges:=wdSaveChanges, OriginalFormat:=wdWordDocument

Related

WordprocessingDocument.CreateFromTemplate method creates corrupted MS Word files

I have proper .dotm template.
When I create a new file based on a template by double clicking in explorer it creates the correct file (based on this template). Created file size after save is 16Kb (without any content).
But if I want to use .CreateFromTemplate method in my code I cannot open a newly created .docx file in MS Word.
New file size is 207Kb (just like .dotm file). MS Word display "run-time error 5398" and not open the file.
I'm using nuget package DocumentFormat.OpenXml 2.19.0, Word 365 version 16.0.14931.20648 - 32bit and code like this:
using (WordprocessingDocument doc = WordprocessingDocument.CreateFromTemplate(templatePath))
{
doc.SaveAs(newFileName);
}
Google is silent about this error, ChatGPT says that:
The "Run-time Error 5398" error means that the file you are trying to open is corrupted or not a valid docx file. Possible reasons for this error may be the following:
The file was not saved correctly after making changes. Verify that the Save() method was called after making changes to the file.
The file was saved with the wrong extension, e.g. as DOTM instead of DOCX
The file was saved in an invalid format.
There may have been some unhandled exceptions in your code.
When I manually change the extension of a new file from docx to dotm, there is no error when opening, but the file does not open.
What am I doing wrong with CreateFromTemplate method?
I tried to reproduce the behavior you described, using the following unit tests:
public sealed class CreateFromTemplateTests
{
private readonly ITestOutputHelper _output;
public CreateFromTemplateTests(ITestOutputHelper output)
{
_output = output;
}
[Theory]
[InlineData("c:\\temp\\MacroEnabledTemplate.dotm", "c:\\temp\\MacroEnabledDocument.docm")]
[InlineData("c:\\temp\\Template.dotx", "c:\\temp\\Document.docx")]
public void CanCreateDocmFromDotm(string templatePath, string documentPath)
{
// Let's not attach the template, which is done by default. If a template is attached, the validator complains as follows:
// The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:attachedTemplate'.
using (var wordDocument = WordprocessingDocument.CreateFromTemplate(templatePath, false))
{
// Validate the document as created with CreateFromTemplate.
ValidateOpenXmlPackage(wordDocument);
// Save that document to disk so we can open it with Word, for example.
wordDocument.SaveAs(documentPath).Dispose();
}
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(documentPath, true))
{
// Validate the document that was opened from disk, just to see what Word would open.
ValidateOpenXmlPackage(wordDocument);
}
}
private void ValidateOpenXmlPackage(OpenXmlPackage openXmlPackage)
{
OpenXmlValidator validator = new(FileFormatVersions.Office2019);
List<ValidationErrorInfo> validationErrors = validator.Validate(openXmlPackage).ToList();
foreach (ValidationErrorInfo validationError in validationErrors)
{
_output.WriteLine(validationError.Description);
}
if (validationErrors.Any())
{
// Note that Word will most often be able to open the document even if there are validation errors.
throw new Exception("The validator found validation errors.");
}
}
}
In both tests, the documents are created without an issue. Looking at the Open XML markup, both documents look fine. However, while I don't get any runtime error, Word also does not open the macro-enabled document.
I am not sure why that happens. It might be related to your security settings.
Depending on whether or not you really need to use CreateFromTemplate(), you could create a .docm (rather than a .dotm) and create new macro-enabled documents by copying that .docm.
I opened an issue in the Open XML SDK project on GitHub.

How to get MS Word total pages count using Open XML SDK?

I am using below code to get the page count but it is not giving actual page count(PFA). What is the better way to get the total pages count?
var pageCount = doc.ExtendedFilePropertiesPart.Properties.Pages.Text.Trim();
Note: We cannot use the Office Primary Interop Assemblies in my Azure web app service
Thanks in advance.
In theory, the following property can return that information from the Word Open XML file, using the Open XML SDK:
int pageCount = (int) document.ExtendedFilePropertiesPart.Properties.Pages.Text;
In practice, however, this isn't reliable. It might work, but then again, it might not - it all depends on 1) What Word managed to save in the file before it was closed and 2) what kind of editing may have been done on the closed file.
The only sure way to get a page number or a page count is to open a document in the Word application interface. Page count and number of pages is calculated dynamically, during editing, by Word. When a document is closed, this information is static and not necessarily what it will be when the document is open or printed.
See also https://github.com/OfficeDev/Open-XML-SDK/issues/22 for confirmation.
This code worked for me. It adds "Page X of Y" to the document.
para = new Paragraph(new Run(
new Text() { Text = "Page ", Space = SpaceProcessingModeValues.Preserve },
new SimpleField() { Instruction = "PAGE" },
new Text() { Text = " of ", Space = SpaceProcessingModeValues.Preserve },
new SimpleField() { Instruction = "NUMPAGES \\*MERGEFORMAT" }));

Unable to get visible texteditor in other viewcolumn

I'm trying to open the active file in a new viewcolumn, and fold a tag. The folding commands work fine in activeTextEditor:
// Fold based on linenumber
let range = editor.document.lineAt(lineNumber).range;
editor.selection = new vscode.Selection(range.start, range.end);
editor.revealRange(range);
commands.executeCommand('editor.fold');
Now I would like to do the same in a newly opened file:
// Open the same file in a new column
// at this time editor.ViewColum is One
commands.executeCommand('vscode.open', Uri.file(editor.document.fileName), ViewColumn.Two);
// Try to get that editor
let newEditor = vscode.window.visibleTextEditors.find(x=> x.viewColumn===viewColumn.Two && x.document.fileName===fileName)
The problem is that newEditor is not found, becouse the newly opened document has ViewColumn undefined.
Any idea how to solve this?
Thanks
The vs commands returns a promise. Needed to await that and everything works fine.

Wait while the console still has information to print

While developing an Eclipse plug-in, I want to output some paths in the console and then add some hyperlinks to those paths.
Instead of parsing the console, I thought about calculating the hyperlinks locations by the string which holds the path length which I print in console.
The problem which I have is that I try to create the hyperlink, but the information isn't printed yet in the console, therefore I get random bad location exceptions. It works if I run it in debug mode and I have a breakpoint on the hyperlink creation loop.
I tried to separate the creation of hyperlinks and wait until the messageQueue is empty, but that didn't work out.
private BlockingQueue<String> messageQueue = Queues.newLinkedBlockingQueue()
for ( String message : messages) {
messageQueue.offer(message);
// here I create a new HyperLinkInformation object and put it in a list
}
// wait until the queue is empty
while(!messageQueue.isEmpty()) {}
for (HyperLink hyperLinkObj : hyperlinkInformations) {
// try to create the hyperlink
}
Any ideas on how I could check if the console still has something to print?

Visual Studio Extension: Project context is applied to every opened file except the first one

I have a Visual Studio extension package where I am applying C++ syntax settings to custom file extensions. This is done in the Visual Studio's Text Editor options. Those files are plain text and I mean to have them behave as code files in the IDE (IntelliSense, find matching braces, etc...)
It's mostly working fine, but there is one problem. The C++ syntax context is not applied to whichever is the first file I open in a given Visual Studio session. I will launch Visual Studio, open one of our custom projects, and open one file. The IDE opens a document window and the file is opened, can be edited and saved, no problem in appearance. But the file behaves as a plain text and not a C++ source. Now, whenever I open a second file in the IDE, or any further file, the C++ settings do get applied successfully. I can close all document tabs, and open new ones, and all those tabs are fine. Even re-opening the first file in a new tab, or after re-loading the project or the solution, is fine. Only the first document opened in a Visual Studio session has the issue.
For the following segment, I will refer to the Microsoft documentation on using their standard editor: https ://msdn.microsoft.com/en-us/library/bb166504.aspx
To implement the OpenItem method with a standard editor
1.Call IVsRunningDocumentTable (RDT_EditLock) to determine whether the document data object file is already open.
2.If the file is already open, resurface the file by calling the IsDocumentOpen method, specifying a value of IDO_ActivateIfOpen for the grfIDO parameter.
If the file is open and the document is owned by a different project than the calling project, your project receives a warning that the editor being opened is from another project. The file window is then surfaced.
3.If the document is not open or not in the running document table, call the OpenStandardEditor method (OSE_ChooseBestStdEditor) to open a standard editor for the file.
When you call the method, the IDE performs the following tasks:
a.The IDE scans the Editors/{guidEditorType}/Extensions subkey in the registry to determine which editor can open the file and has the highest priority for doing this.
b.After the IDE has determined which editor can open the file, the IDE calls CreateEditorInstance. The editor's implementation of this method returns information that is required for the IDE to call CreateDocumentWindow and site the newly opened document.
c.Finally, the IDE loads the document by using the usual persistence interface, such as IVsPersistDocData2.
d.If the IDE has previously determined that the hierarchy or hierarchy item is available, the IDE calls GetItemContext method on the project to get a project-level context IServiceProvider pointer to pass back in with the CreateDocumentWindow method call.
4.Return an IServiceProvider pointer to the IDE when the IDE calls GetItemContext on your project if you want to let the editor get context from your project.
Performing this step lets the project offer additional services to the editor.
If the document view or document view object was successfully sited in a window frame, the object is initialized with its data by calling LoadDocData.
It definitely seems to me that I need to hit element (D) from the above instructions. I have debuged through my extension code, and I do see where my implementation of GetItemContext() comes into play. When I open most files, the code path does effectively go through this method, however it does not when I open the first file of a Visual Studio session.
Call stack from OpenStandardEditor
GetItemContext is invoked by the Microsoft assemblies and I do not know what is the condition that triggers whether it is called or not. I can only trace up to my call to the method OpenStandardEditor(), in FileDocumentManager.cs, then I don't know what happens beyond that. The above screenshot is the call stack when GetItemContext is successfully invoked, but when I'm opening the first file I'm totally in the dark as to what OpenStandardEditor is doing. I do know that in both cases, when the context is loaded and when it is not, the exact same parameter values are passed to OpenStandardEditor. So here's my code where this method is invoked, if that can be of some help:
My override of class DocumentManager:
private int Open(bool newFile, bool openWith, uint editorFlags, ref Guid editorType, string physicalView, ref Guid logicalView, IntPtr docDataExisting, out IVsWindowFrame windowFrame, WindowFrameShowAction windowFrameAction)
{
windowFrame = null;
if (this.Node == null || this.Node.ProjectMgr == null || this.Node.ProjectMgr.IsClosed)
{
return VSConstants.E_FAIL;
}
int returnValue = VSConstants.S_OK;
string caption = this.GetOwnerCaption();
string fullPath = this.GetFullPathForDocument();
// Make sure that the file is on disk before we open the editor and display message if not found
if (!((FileNode)this.Node).IsFileOnDisk(true))
{
// Inform clients that we have an invalid item (wrong icon)
this.Node.OnInvalidateItems(this.Node.Parent);
// Bail since we are not able to open the item
return VSConstants.E_FAIL;
}
IVsUIShellOpenDocument uiShellOpenDocument = this.Node.ProjectMgr.Site.GetService(typeof(SVsUIShellOpenDocument)) as IVsUIShellOpenDocument;
IOleServiceProvider serviceProvider = this.Node.ProjectMgr.Site.GetService(typeof(IOleServiceProvider)) as IOleServiceProvider;
try
{
int result = VSConstants.E_FAIL;
if (openWith)
{
result = uiShellOpenDocument.OpenStandardEditor((uint)__VSOSEFLAGS.OSE_UseOpenWithDialog, fullPath, ref logicalView, caption, this.Node.ProjectMgr, this.Node.ID, docDataExisting, serviceProvider, out windowFrame);
}
else
{
__VSOSEFLAGS openFlags = 0;
if (newFile)
{
openFlags |= __VSOSEFLAGS.OSE_OpenAsNewFile;
}
//NOTE: we MUST pass the IVsProject in pVsUIHierarchy and the itemid
// of the node being opened, otherwise the debugger doesn't work.
if (editorType != Guid.Empty)
{
result = uiShellOpenDocument.OpenSpecificEditor(editorFlags, fullPath, ref editorType, physicalView, ref logicalView, caption, this.Node.ProjectMgr, this.Node.ID, docDataExisting, serviceProvider, out windowFrame);
}
else
{
openFlags |= __VSOSEFLAGS.OSE_ChooseBestStdEditor;
// THIS IS THE CALL THAT I'M ALWAYS INVOKING. PARAMS ARE ALWAYS THE SAME, BUT ITEM CONTEXT IS NOT ACTIVATED FOR FIRST FILE OF A SESSION.
result = uiShellOpenDocument.OpenStandardEditor((uint)openFlags, fullPath, ref logicalView, caption, this.Node.ProjectMgr, this.Node.ID, docDataExisting, serviceProvider, out windowFrame);
}
}
if (result != VSConstants.S_OK && result != VSConstants.S_FALSE && result != VSConstants.OLE_E_PROMPTSAVECANCELLED)
{
ErrorHandler.ThrowOnFailure(result);
}
if (windowFrame != null)
{
object var;
if (newFile)
{
ErrorHandler.ThrowOnFailure(windowFrame.GetProperty((int)__VSFPROPID.VSFPROPID_DocData, out var));
IVsPersistDocData persistDocData = (IVsPersistDocData)var;
ErrorHandler.ThrowOnFailure(persistDocData.SetUntitledDocPath(fullPath));
}
var = null;
ErrorHandler.ThrowOnFailure(windowFrame.GetProperty((int)__VSFPROPID.VSFPROPID_DocCookie, out var));
this.Node.DocCookie = (uint)(int)var;
if (windowFrameAction == WindowFrameShowAction.Show)
{
ErrorHandler.ThrowOnFailure(windowFrame.Show());
}
else if (windowFrameAction == WindowFrameShowAction.ShowNoActivate)
{
ErrorHandler.ThrowOnFailure(windowFrame.ShowNoActivate());
}
else if (windowFrameAction == WindowFrameShowAction.Hide)
{
ErrorHandler.ThrowOnFailure(windowFrame.Hide());
}
}
}
catch (COMException e)
{
Trace.WriteLine("Exception e:" + e.Message);
returnValue = e.ErrorCode;
this.CloseWindowFrame(ref windowFrame);
}
return returnValue;
}
I have also tried an alternative. In the call stack where I perform DoDefaultAction on my FileNode (extends HierarchyNode), I normally call an instance of my DocumentManager.Open() directly. I have changed that to try OpenDocumentViaProject() instead. Now, the MSENV assembly turns out to call my GetItemContext, then goes out to my implementation of DocumentManager.Open I quoted above.
Call stack from OpenDocumentViaProject
Sounds promising... but no. Beyond the screenshot above, once I call OpenStandardEditor the exact same behavior happens. No project context is applied to the first document opened in a session, and the context is applied to every further file. The call to GetItemContext() that is done by OpenDocumentViaProject() does not seem to matter in the slightest. Only when OpenStandardEditor() also ends up calling GetItemContext() somewhere downstream does the project settings I want get applied.
I don't see where I would be doing something fundamentally wrong. It seems to me that I am following the Mimcrosoft instructions on opening standard editors. Would you have a clue as to how my GetItemContext implementation is not invoked when I'm opening the first file of a VS session? Thanks