How does Powershell access Word and Word objects? - powershell

I am using the following script to convery a directory full of txt and rtf documents to Word documents.
$global:word = new-object -comobject word.application
$word.Visible = $False
$srcfiles = \\Path\to\files
$savepath = \\Path\to\docfiles
$saveFormatDoc = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], 0);
function saveas-document ($docs) {
"Opening $($docs.FullName)"
$opendoc = $word.documents.open($docs.FullName)
$savepath = "$docPath$($docs.BaseName)"
"Converting to $savepath.doc"
$opendoc.saveas([ref]"$savepath", [ref]$saveFormatDoc)
}
ForEach ($doc in $srcfiles) {
saveas-document -docs $doc
}
With the way it's coded right now, I think it's opening a new instance of Word, opening the rtf/txt document in Word, saving as a Word doc and repeating. I'm not quitting Word (unless there's an implicit $word.quit() in there somewhere. I fear I'm killing performance with multiple instances of Word, but don't know if that's a concern I need to have or not.
What's a more efficient way of coding this to open a single instance of Word for document handling, and what's the syntax for opening Word as an application rather than opening a series of documents that launch the Application based on their file associations (which is what I fear it's doing now)?

Your New-Object is outside your loop so you're only using the one instance. When you assign the New-Object to a variable (the object reference) and reuse it, that same object get's reused unless you destroy it or create another. Your script, in it's current form, is not creating a new object for each $doc in $srcfiles as you might be thinking.

With COM Objects, you are not launching the application, or dozens of them. You are opening a binary interface to libraries
You don't need or want to open a Com Object over and over, as that would cause a delay and performance hit.
You should close the Com-Object ($word.Quit()) at the end of the script though.

Related

Microsoft.XMLDOM.1.0 - "the data necessary to complete this operation is not available"

I'm trying to load a XML with the following code:
$xsl = new-object -ComObject Microsoft.XMLDOM.1.0
$xsl.load('http://172.16.177.200/1.xml')
$xsl.transformNode($xsl)
However, $xsl.load only work if the XML is a local file. With the code above, transformNode raises an error "the data necessary to complete this operation is not available"
Tried to add a start-sleep 5, no help.
Any ideas?
Microsoft.XMLDOM has been deprecated for years. In COM land you should use Msxml2.DOMDocument (more specifically Msxml2.DOMDocument.6.0). Related.
With COM XML objects you should also disable asynchronous processing, so that loading/parsing the XML document is completed before the next instruction.
$xsl = New-Object -ComObject 'Msxml.DOMDocument.6.0'
$xsl.Async = $false
$xsl.Load('http://172.16.177.200/1.xml')
With that said, since PowerShell is built on top of .Net it's recommended to work with .Net rather than COM:
$xsl = New-Object Xml
$xsl.Load('http://172.16.177.200/1.xml')
In .Net land loading/parsing the XML file is synchronous by default. You will need a replacement for TransformNode(), though. See here.

creating comobject for AccessObjects

I am trying to create a powershell 'AccessObject' comobject for my MS Access app. Basically, I will trying to create a powershell script that gets queries in a database and the tables and/or queries a particular query depends on. To do that i will need to have an instance of the MS Access 'AccessObject' and 'DependencyInfo' classes in my powershell script. I have attached a snippet of the function i intend to use. This is not the complete function, please note. All i want is to know how to create an instance of the DependencyInfo and AccessObjects in powershell.
function getQueries([string] $database)
{
$dbEng = New-Object -ComObject DAO.DBEngine.120
$AccessApp= New-Object -ComObject Access.Application
$Dependency = $AccessApp.DependencyInfo
$AccessObject=$AccessApp.AccessObjects
...
}
All i want is to know how to create an instance of the DependencyInfo
and AccessObjects in powershell.
The following creates a new Access process, opens a local accdb file, and retrieves the dependencies for a given form:
$db = new-object -ComObject 'Access.Application'
$db.OpenCurrentDatabase('C:\temp\deezNutz.accdb')
$dependency_info = $db.Application.CurrentProject().AllForms('frm_person').GetDependencyInfo()
foreach ($dependency in $dependency_info.Dependencies) { $dependency.FullName }
$db.CloseCurrentDatabase()
$db.Application.DoCmd.Quit()
If you're trying to pro grammatically manipulate the objects in a Microsoft Access database e.g., forms, reports, queries, etc. Your best bet is to search for solutions using VBA then convert those to Powershell. For this example, I first wrote the solution in VBA then converted it to Powershell.
Thanks #Lord Adam. This was really helpful. In my case i had to modify the logic a little bit:
$AccessApp= New-Object -ComObject 'Access.Application'
$AccessApp.OpenCurrentDatabase($database)
$AccessApp.Application.SetOption("Track Name AutoCorrect Info", $true)
$QryDependency = $AccessApp.Application.CurrentData.AllQueries.Item($query.Name).GetDependencyInfo()
ForEach($di in $QryDependency.Dependencies)
{
$QryObjects= $QryObjects + $di.Name +","
}

Using querySelectorAll on an mshtml.HTMLDocumentClass object in PowerShell causes a crash

I'm trying to do some web-scraping via PowerShell, as I've recently discovered it is possible to do so without too much trouble.
A good starting point is to just fetch the HTML, use Get-Member, and see what I can do from there, like so:
$html = Invoke-WebRequest "https://www.google.com"
$html.ParsedHtml | Get-Member
The methods available to me for fetching specific elements appear to be the following:
getElementById()
getElementsByName()
getElementsByTagName()
For example I can get the first IMG tag in the document like so:
$html.ParsedHtml.getElementsByTagName("img")[0]
However after doing some more research in to whether I could use CSS Selectors or XPath I discovered that there are unlisted methods available, since we are just using the HTML Document object documented here:
querySelector()
querySelectorAll()
So instead of doing:
$html.ParsedHtml.getElementsByTagName("img")[0]
I can do:
$html.ParsedHtml.querySelector("img")
So I was expecting to be able to do:
$html.ParsedHtml.querySelectorAll("img")
...in order to get all of the IMG elements. All the documentation I've found and googling I've done supports this. However, in all my testing this function crashes the calling process and reports a heap corruption exception code in the Event Log (0xc0000374).
I'm using PowerShell 5 on Windows 10 x64. I've tried it in a Win10 x64 VM that is a clean build and just patched up. I've also tried it in Win7 x64 upgraded to PowerShell 5. I haven't tried it on anything prior to PowerShell 5 as all our systems here are upgraded, but I probably will once I have time to spool a new vanilla VM for testing.
Has anyone run in to this issue before? All my research so far is a dead end. Are there alternatives to querySelectorAll? I need to scrape pages that will have predictable sets of tags inside unpredictable layouts and potentially no IDs or classes assigned to the tags, so I want to be able to use selectors that allow structure/nesting/wildcards.
P.S. I've also tried using the InternetExplorer.Application COM object in PowerShell, the result is the same, except instead of PowerShell crashing Internet Explorer crashes. This was actually my original approach, here's the code:
# create browser object
$ie = New-Object -ComObject InternetExplorer.Application
# make browser visible for debugging, otherwise this isn't necessary for function
$ie.Visible = $true
# browse to page
$ie.Navigate("https://www.google.com")
# wait till browser is not busy
Do { Start-Sleep -m 100 } Until (!$ie.Busy)
# this works
$ie.document.getElementsByTagName("img")[0]
# this works as well
$ie.document.querySelector("img")
# blow it up
$ie.document.querySelectorAll("img")
# we wanna quit the process, but since we blew it up we don't really make it here
$ie.Quit()
Hope I'm not breaking any rules and this post makes sense and is relevant, thanks.
UPDATE
I tested earlier PowerShell versions. v2-v4 crash using the InternetExplorer.Application COM method. v3-4 crash using the Invoke-WebRequest method, v2 doesn't support it.
I ran into this problem, too, and posted about it on reddit. I believe the problem happens when Powershell tries to enumerate the HTML DOM NodeList object returned by querySelectorAll(). The same object is returned by childNodes() which can be enumerated by PS, so I'm guessing there's some glue code written for .ParsedHtml.childNodes but not .ParsedHtml.querySelectorAll(). The crash can be triggered by Intellisense trying to get tab-complete help for the object, too.
I found a way around it, though! Just access the native DOM methods .item() and .length directly and emit the node objects into a PowerShell array. The following code pulls the newest page of posts from /r/Powershell, gets the post list anchors via querySelectorAll() then manually enumerates them using the native DOM methods into a Powershell-native array.
$Result = Invoke-WebRequest -Uri "https://www.reddit.com/r/PowerShell/new/"
$NodeList = $Result.ParsedHtml.querySelectorAll("#siteTable div div p.title a")
$PsNodeList = #()
for ($i = 0; $i -lt $NodeList.Length; $i++) {
$PsNodeList += $NodeList.item($i)
}
$PsNodeList | ForEach-Object {
$_.InnerHtml
}
Edit .Length seems to work capitalized or lower-case. I would have expected the DOM to be case-sensitive, so either there's some things going on to help translate or I'm misunderstanding something. Also, the CSS selector is grabbing the source links (self.PowerShell mostly), but that it my CSS selector logic error, not a problem with querySelectorAll(). Note that the results of querySelectorAll() are not live, so modifying them won't modify the original DOM. And I haven't tried modifying them or using their methods yet, but clearly we can grab at the very least .InnerHtml.
Edit 2: Here is a more-generalized wrapper function:
function Get-FixedQuerySelectorAll {
param (
$HtmlWro,
$CssSelector
)
# After assignment, $NodeList will crash powershell if enumerated in any way including Intellisense-completion while coding!
$NodeList = $HtmlWro.ParsedHtml.querySelectorAll($CssSelector)
for ($i = 0; $i -lt $NodeList.length; $i++) {
Write-Output $NodeList.item($i)
}
}
$HtmlWro is an HTML Web Response Object, the output of Invoke-WebReqest. I originally tried to pass .ParsedHtml but then it would crash on assignment. Doing it this way returns the nodes in a Powershell array.
The #midnightfreddie's solution worked fine for me before, but now it throws Exception from HRESULT: 0x80020101 when calling $NodeList.item($i).
I found the following workaround:
function Invoke-QuerySelectorAll($node, [string] $selector)
{
$nodeList = $node.querySelectorAll($selector)
$nodeListType = $nodeList.GetType()
$result = #()
for ($i = 0; $i -lt $nodeList.length; $i++)
{
$result += $nodeListType.InvokeMember("item", [System.Reflection.BindingFlags]::InvokeMethod, $null, $nodeList, $i)
}
return $result
}
This one works for New-Object -ComObject InternetExplorer.Application as well.

Multiple Actions for one Outlook rule in Powershell

I have a question about Outlook Rules in Powershell. I wrote some code that successfully stores any incoming e-mail from a certain sender to the deleted-items folder. I did this because when the mails enter the junk folder, the junk folder still has the counter token of mails, so in the end it will say I have 10-mails in the junk folder.
I want to avoid this by just throwing the incoming mails from that sender to the deleted-items folder and also marking the mail as "read" so that I don't see the clutter in the deleted items folder.
The question is really:
Can I add multiple actions to the same outlook rule in powershell? if so, how?
What is the syntax / code for the "run script" action?
My code so far:
$ol = New-Object -ComObject Outlook.Application
$ns = $ol.GetNamespace("MAPI")
$olFolders = "Microsoft.Office.Interop.Outlook.OlDefaultFolders"
$outlook = New-Object -ComObject outlook.application
$namespace = $Outlook.GetNameSpace("MAPI")
$inBox = $ns.GetDefaultFolder([Microsoft.Office.Interop.Outlook.OlDefaultFolders]::olFolderInbox)
$deleted = $ns.GetDefaultFolder([Microsoft.Office.Interop.Outlook.OlDefaultFolders]::olFolderDeletedItems)
$rules = $outlook.session.DefaultStore.GetRules()
$rule = $rules.create("Move mail: to DeletedItems", [Microsoft.Office.Interop.Outlook.OlRuleType]::olRuleReceive)
$rule_Address = $rule.Conditions.SenderAddress
$rule_Address.Enabled = $true
$rule_Address.Address = #("<Sender Address>")
$action = $rule.Actions.MoveToFolder
$action.Enabled = $true
[Microsoft.Office.Interop.Outlook._MoveOrCopyRuleAction].InvokeMember("Folder",[System.Reflection.BindingFlags]::SetProperty,$null, $action, $deleted)
$rules.Save()
This code works so far.
Please help.
Thanks!
Can I add multiple actions to the same outlook rule in powershell? if so, how?
Took a bit but I got a working test that uses multiple actions applied to a single rule. It is actually easy and you just need to repeat the steps you have already done and create a different action variable.
In my example, just showing the end of the code, I have added a action to display a message in the New Item Alert window.
...
$action = $rule.Actions.MoveToFolder
$action.Enabled = $true
$anotherAction = $rule.Actions.NewItemAlert
$anotherAction.Text = "I am awesome!"
$anotherAction.Enabled = $true
[Microsoft.Office.Interop.Outlook._MoveOrCopyRuleAction].InvokeMember("Folder",[System.Reflection.BindingFlags]::SetProperty,$null, $action, $deleted)
$rules.Save()
You quite possibly already tried something like this. If not there is an important reference for this that you need to be aware of.
What is the syntax / code for the "run script" action?
This is one of the actions you cannot programatically set as per this reference for Office 2007 or this one for Office 2010/2013. The tables are similar and rather large to include here but I will reference the one in your 2nd bullet.
Action : Start a script
Constant in olRuleActionType : olRuleActionRunScript
Supported when creating new rules programmatically? : No
Apply to olRuleReceive rules? : Yes
Apply to olRuleSend rules? : No
There are others as well where you are restricted. So you need to keep that in mind when you are making your rules.

Read missing Word template from Powershell

I have a series of Word documents which link to templates which no longer exist. This is causing problems for users trying to open them. I can get a list of the documents, loop through each one, and set the tempalte to null. While this will solve the problem, I can't determine what the template was before I changed it.
In cases where the template is not available on open, Word will replace the attached template with Normal.dot(x). However, the template I'm trying find is located in the document's Tempaltes dialog. Both AttachedTempalte() and get_AttachedTemplate().Name return Normal.dot when I know the document in question has a different template listed in the Templates dialog in word.
I can access this in VBA, and it's fustrating to not be able to do this in PS. Can anyone see where I'm messing up?
$word = new-object -comobject "Word.Application"
$doc = $word.Documents.Open({document path})
$word.Dialogs(Microsoft.Office.Interop.Word.WdWordDialog.wdDialogToolsTemplates).Template()
Returns:
Missing ')' in method call.
At :line:1 char:15
+ $word.Dialogs(M <<<< icrosoft.Office.Interop.Word.WdWordDialog.wdDialogToolsTemplates).Template()
Working VBA:
Dim doc as Word.Document
Dim strTemplate as String
Set doc = Documents.Open(Filename:=filename, Visible:=False)
doc.Activate
strTemplate = Word.Dialogs(wdDialogsToolsTemplates).Template
After which I can see the template name and path I should see in strTemplate.
I checked the ps script and adding $doc.Activate doesn't seem to help. I also noticed that the interop and VBA do not use the same wdDialog. PS uses wdDialogToolsTemplates and VBA using wdDialogsToolsTemplates. I checked the assembly in PS with the following
[Reflection.Assembly]::LoadWithPartialName("Microsoft.Office.Interop.Word") | out-null
[Enum]::GetNames("Microsoft.Office.Interop.Word.WdDialogs")
and confirmed the correct option is wdDialogToolsTemplates.
In powershell you must use the [] brackets to specify types and then the :: to specify the type member, so your 3rd line of powershell code should look like this:
$word.Dialogs([Microsoft.Office.Interop.Word.WdWordDialog]::wdDialogToolsTemplates).Template()
See these blog posts about powershell enums: Jeffrey Snover or Richard Siddaway.
I am trying to do something similar and the main aim is not to store any code inside a Word document.
PowerShell
I made a tiny bit of progress with the PowerShell route but I can't find a way to extract the path from the dialog boxes.
$objWord = New-Object -ComObject "Word.Application"
$objWord.Visible = $True
$objDoc = $objWord.Documents.Open("C:\path\to\document.doc")
$objToolsTemplates = $objWord.Dialogs.Item([Microsoft.Office.Interop.Word.WdWordDialog]::wdDialogToolsTemplates)
$objDocStats = $objWord.Dialogs.Item([Microsoft.Office.Interop.Word.WdWordDialog]::wdDialogDocumentStatistics)
Now $objToolsTemplates.Show() and $objToolsTemplates.Show() both cause the GUI to display dialogs containing the path to the original (unavailable) template but I can't find any way to extract that information programatically. According to MSDN WdWordDialog Enumeration documentation both of the dialogs should have a Template property.
VBScript
In the end I had to go with VBS instead. The following code will give you the path of the original (unavailable) template. It's still a script at least.
Option Explicit
Const wdDialogToolsTemplates = 87
Const wdDoNotSaveChanges = 0
Dim objWordApp
Dim objDoc
Dim dlgTemplate
Set objWordApp = CreateObject("Word.Application")
objWordApp.Visible = False
Set objDoc = objWordApp.Documents.Open("C:\path\to\document.doc")
Set dlgTemplate = objWordApp.Dialogs(wdDialogToolsTemplates)
Wscript.Echo dlgTemplate.Template
objDoc.Close(wdDoNotSaveChanges)
objWordApp.Quit