MS Word's range arithmetic in Powershell - powershell

I'm trying to automate adding new tables to a Word document using Powershell.
I wrote a Powershell script that is meant for adding summary tables based on a whole document in a proper location. It gathers the information from the file contents and then, in the selected ranges, creates a new summary tables. The table is always inserted at the end of the range (which is a chapter in the document). The range is based on the list headers. However, while adding a new table to the selected range, I cannot force Word to leave the next header which is chosen as the end of the range. It gets deleted.
For example: I'm having chapters from 1.1 to 1.10 in my file and I'm choosing to add a new table at the end of the chapter 1.1, right before the chapter 1.2. A whole chapter 1.2. header is deleted and the chapter 1.3 is now labeled as 1.2.
I tried substracting various numbers from the Range.End property, following information in the Microsoft documentation (https://learn.microsoft.com/en-us/office/vba/api/word.range.end), however is doesn't seem to give any results.
The code (shortcut):
Add-Type -AssemblyName Microsoft.Office.Interop.Word
$word = New-Object -ComObject Word.application
$report = $word.Documents.Open("C:\file.docx")
#information gathering here
#find the right location and add a new table
$start = $report.Paragraphs | ? {$_.Range.ListFormat.ListString -eq '1.1'} | % {$_.Range}
$end = $report.Paragraphs | ? {$_.Range.ListFormat.ListString -eq '1.2'} | % {$_.Range}
$first_table = $report.Range($start.Start, $end.Start).Tables.Add($end, 24, 4, [ref]$DefaultTableBehavior::wdWord9TableBehavior, [ref]$AutoFitBehavior::wdAutoFitFixed)
#continue with filling up the table

Part of the problem is that if you have a sequence of empty auto-numbered paragraphs, inserting a table into one of them will mean that subsequent paragraphs will be placed inside the table.
Also, if your 1.1 section can contain material, AFAICS your $start will contain the range of its first paragraph, which isn't very useful - all you really need is that location of the paragraph you want to insert before.
As an alternative, I suggest that you start by inserting a paragraph mark immediately before the 1.2 heading, then insert the table.
e.g. like this:
#$start = $report.Paragraphs | ? {$_.Range.ListFormat.ListString -eq '1.1'} | % {$_.Range}
$end = $report.Paragraphs | ? {$_.Range.ListFormat.ListString -eq '1.2'} | % {$_.Range}
$place = $report.Range($end.Start - 1, $end.Start-1)
$place.InsertParagraph()
$place = $report.Range($end.Start - 1, $end.Start-1)
$first_table = $place.Tables.Add($place, 24, 4, [ref]$DefaultTableBehavior::wdWord9TableBehavior, [ref]$AutoFitBehavior::wdAutoFitFixed)
If you don't want that extra paragraph mark, you can probably delete it, and if you really want, you could use
$report.Range($end.Start-1, $end.Start).Text = ""
after the table insertion.

Related

PowerShell - How can I select the last row of the last table in a Word document, copy it, and insert as a new row at the bottom of the table?

I am trying to increase the level of automation within my day-to-day work, part of which involves adding a line to the end of a table within a report that contains largely the same information, with a few cells changed (new dates).
I have a little experience with VB and C++, but I am very much an amateur when it comes to PowerShell, which seems to be the go-to for task automation.
I have a couple of PowerShell scripts that search through the body of the report and change text, but the last part of the report is a record, and needs appending as opposed to amending.
How would I go about this?
I have tried mangling a few bits of PowerShell code I've found online, to no avail. I have gotten as far as selecting a row in the correct table, but I have no idea how I might then select the last row, copy this and insert it beneath as a new row at the bottom of the table:
$objWord = New-Object -ComObject word.application
$objWord.Visible = $True
$objWord.Documents.Open("FILEPATH")
$FindText = "KEYTEXT"
$objWord.Selection.Find.Execute($FindText)
$objWord.Selection.SelectRow()
Here's a lengthy example with explanations that does what you're looking for:
# open document
$objWord = New-Object -ComObject word.application
$objWord.Visible = $True
$doc = $objWord.Documents.Open("C:\temp\temp.docx")
# search for row
$FindText = "Value2"
$result = $objWord.Selection.Find.Execute($FindText)
$objWord.Selection.SelectRow()
# copy the ranges from searched row
$table = $objWord.Selection.Tables[1]
$copiedCells = $objWord.Selection.Cells | select columnindex,rowindex
# add row at the end of table
$table.Rows[$table.Rows.Count].Select()
$objWord.Selection.InsertRowsBelow(1)
# insert copied text into each column of last row
foreach ($cell in $copiedCells) {
# copy value from cell in copied row
$copiedText = $table.Cell($cell.rowindex, $cell.columnindex).Range.Text
# remove last 2 characters (paragraph and end-of-cell)
$TrimmedText = $copiedText.Remove($copiedText.Length - 2)
# set value for cell in last row
$table.Cell($table.Rows.Count,$cell.columnindex).Range.Text = $TrimmedText
}

powershell extracting data from strings or other suggestions

I have a script I am writing that essentially reads data from an excel document that is generated from another tool. It lists file ages in the format listed below. My issue is I would like to process each cell value and change the cell color based on that value. So anything older than 1 year gets changed to RED, 90+ days gets yellow\orange.
So after a bit of research, I elected to use an if statement to determine when it is greater than 0 years which seems to work fine, however when I reach the days portion I'm not sure how to extract JUST the digits portion to the left of d in each cell when you get to the y if its there just stop OR possibly just read the left digits only if the $_ contains d then I could further process if that value is -gt 90? I am unsure of how to extract variable length strings only if they are digits left of a character. I considered using a combination of the below method of finding a character and returning up to y or something else.
Find character position and update file name
Possible Age Formats:
13y170d
3y249d
8h7m
1y109d
1y109d
1y109d
5d22h
3y281d
3y184d
11y263d
7m25s
1h14m
[regex]$years = "\d{1,3}[0-9]y"
[regex]$days_90 = "\d{0,3}[0-9]d"
conditionally formatting/coloring row based on age (years)
if ( $( A$_ -match "$years") -eq $True ) {
$($test_home).$("Last Accessed") | ForEach-Object { $( $($_.Contains("y") -eq $True ) { New-ConditionalText -Text Red } }
conditionally formatting/coloring row based on age (90+ days)
if ( $( A$_ -match "$days_90") -eq $True ) { New-ConditionalText -Text Yellow }
What you are after is a positive lookahead and lookbehind. Effectivly it gets the text between two characters or sets. Really handy if you have a consistently formatted set of data to work with.
[regex]$days_90 = '(?<=y).*?(?=d)'
. Matches any characters without line breaks.
* Matches 0 or more of the preceding token.
? Makes the regex lazy and try to match as few as possible.

Replace string untill its length is less than limit with PowerShell

I try to update users AD accounts properties with values imported from csv file.
The problem is that some of the properties like department allow strings of length of max length 64 that is less than provided in the file which can be up to 110.
I have found and adopted solution provided by TroyBramley in this thread - How to replace multiple strings in a file using PowerShell (thank You Troy).
It works fine but... Well. After all replaces have place the text is less meaningful than originally.
For example, original text First Department of something1 something2 something3 something4 would result in 1st Dept of sth1 sth2 sth3 sth4
I'd like to have control over the process so I can stop it when the length of the string drops just under the limit alowed by AD property.
By the way. I'd like to have a choice which replacement takes first, second and so on, too.
I put elements in a hashtable alphabetically but it seems that they are not processed this way. I can't figure out the pattern.
I can see the resolution by replacing strings one by one, controlling length after each replacement. But with almost 70 strings it leds to huge portion of code. Maybe there is simpler way?
You can iterate the replacement list until the string reaches the MaxLength defined.
## Q:\Test\2018\06\26\SO_51042611.ps1
$Original = "First Department of something1 something2 something3 something4"
$list = New-Object System.Collections.Specialized.OrderedDictionary
$list.Add("First","1st")
$list.Add("Department","Dept")
$list.Add("something1","sth1")
$list.Add("something2","sth2")
$list.Add("something3","sth3")
$list.Add("something4","sth4")
$MaxLength = 40
ForEach ($Item in $list.GetEnumerator()){
$Original = $Original -Replace $Item.Key,$Item.Value
If ($Original.Length -le $MaxLength){Break}
}
"{0}: {1}" -f $Original.Length,$Original
Sample output with $MaxLength set to 40
37: 1st Dept of sth1 sth2 sth3 something4

Read word document (*.doc) content with tables etc

I have a word document (2003). I am using Powershell to parse the content of the document.
The document contains a few lines of text at the top, a dozen tables with differing number of columns and then some more text.
I expect to be able to read the document as something like the below:
Read document (make necessary objects etc)
Get each line of text
If not part of a table, process as text and Write-Output
else
If part of a table
Get table number (by order) and parse output based on columns
end if
Below is the powershell script that I have begun to write:
$objWord = New-Object -Com Word.Application
$objWord.Visible = $false
$objDocument = $objWord.Documents.Open($filename)
$paras = $objDocument.Paragraphs
foreach ($para in $paras)
{
Write-Output $para.Range.Text
}
I am not sure if Paragraphs is what I want. Is there anything more suitable for my purpose?
All I am getting now is the entire content of the document. How do I control what I get. Like I want to get a line, be able to determine if it is part of a table or not and take an action based on what number table it is.
You can enumerate the tables in a Word document via the Tables collection. The Rows and Columns properties will allow you to determine the number of rows/columns in a given table. Individual cells can be accessed via the Cell object.
Example that will print the value of the cell in the last row and last column of each table in the document:
$wd = New-Object -ComObject Word.Application
$wd.Visible = $true
$doc = $wd.Documents.Open($filename)
$doc.Tables | ForEach-Object {
$_.Cell($_.Rows.Count, $_.Columns.Count).Range.Text
}

Trying to use Powershell to remove all sentences flagged by the Microsoft Word Grammar Checker

I'm trying to use Powershell to remove all sentences flagged by the Microsoft Word Grammar Checker. I got pretty far looking at the Office Word 2010 Word Object Model. I was able to find the next grammatical incorrect sentence in a document, and was able to delete it. My only problem now is to loop through a document and to delete all of sentences flagged by Microsoft Word Grammar Checker. Here's what I have so far.
cd c:\testruns\
$docPath = "" + $(Get-Location) + "\Grammar\document.docx"
$Word = New-Object -ComObject Word.Application
$Word.Visible = $True
$doc = $Word.documents.open($docPath)
$docSelection = $Word.selection
# Word Method Constants
$wdGoToSpellingError = 13
$wdGoToGrammaticalError = 14
$wdGoToFirst = 1
$wdGoToLast = -1
$wdGoToNext = 2
while (!$AnymoreGrammar) {
[void]$docSelection.GoTo($wdGoToGrammaticalError, $wdGoToNext).delete()
}
Of course the variable $AnymoreGrammar is just pseudocode for a boolean variable that I want to find. I need a valid boolean test in the while loop that checks to see if the document has anymore grammatical errors. If I don't, than the $wdGoToNext will keep going even if there's no grammatical errors. It deletes the first sentence's letter if it can't find a sentence that's flagged with a grammatical error. Any help? I'm using this as a reference.
(http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.wdgotoitem.aspx)
The problem is that your $docSelection is not updated. What you do is delete a sentence and then delete the very same sentence from the very same selection again and again and again. You need to update $docSelection after each deletion, like this:
while (!$AnymoreGrammar) {
$docSelection.GoTo($wdGoToGrammaticalError, $wdGoToNext).delete()
$docSelection = $Word.selection
}
It deleted everything from the doc for me, but at least it's looping now
Ended up solving it a bit ago. Found a ProofreadingError object that contains a property called Count that returns the number of GrammaticalErrors. (msdn.microsoft.com/en-us/library/aa213190(v=office.11).aspx)
So I set the While Loop test to
$errorCount = $doc.GrammaticalErrors.Count
while ($errorCount -ne 0)