Parse HTML Table in PowerShell V3 - powershell

I have the following HTML table Link To the HTML
I want to parse it and convert it to XML/CSV/PS Object,
I tried to do with HtmlAgilityPack.dll but no success.
Can anybody give me any directions to do it?
I want to convert the table to a PSObject and export it to csv,
I currently have just the beginning of the code,
and access to the lines but i can't access to the values in the lines
Add-Type -Path C:\Windows\system32\HtmlAgilityPack.dll
$HTML = New-Object HtmlAgilityPack.HtmlDocument
$res = $HTML.Load("C:\Test\Test.html")
$table = $HTML.DocumentNode.SelectNodes("//table/tr/td/nobr")
when i access to $table[0..47].InnerHtml i get only the first ** column ** of the file,
i can't access to the 2nd and etc
Thanks Ohad

you can try this to get all the html in <nobr> tags. I let you find the logic to output what you want...
$ie = new-object -com "InternetExplorer.Application"
$ie.navigate("http://urltoyourfile.html")
$doc = $ie.Document
($doc.getElementsByTagName("nobr"))|%{$_.innerHTML}
Output :
Lead User
Accesses
Last Accessed
Average
Max
Min
Total
amirt</NO br>
2
01/20/2013 09:40:47
04:18:17
06:19:26
02:17:09
08:36:35
andream
1
01/20/2013 10:33:01
02:34:37
02:34:37
02:34:37
02:34:37
avnerm
1
01/17/2013 11:34:16
00:30:44
00:30:44
00:30:44
00:30:44
brouria
a way to parse it :
($doc.getElementsByTagName("nobr"))|%{
write-host -nonew $_.innerHTML";"
$cpt++
if ($cpt % 8 -eq 0){$cpt=1;write-host ""}
}

Related

PowerShell - How can I select the last row of the last table in a Word document, copy it, and insert as a new row at the bottom of the table?

I am trying to increase the level of automation within my day-to-day work, part of which involves adding a line to the end of a table within a report that contains largely the same information, with a few cells changed (new dates).
I have a little experience with VB and C++, but I am very much an amateur when it comes to PowerShell, which seems to be the go-to for task automation.
I have a couple of PowerShell scripts that search through the body of the report and change text, but the last part of the report is a record, and needs appending as opposed to amending.
How would I go about this?
I have tried mangling a few bits of PowerShell code I've found online, to no avail. I have gotten as far as selecting a row in the correct table, but I have no idea how I might then select the last row, copy this and insert it beneath as a new row at the bottom of the table:
$objWord = New-Object -ComObject word.application
$objWord.Visible = $True
$objWord.Documents.Open("FILEPATH")
$FindText = "KEYTEXT"
$objWord.Selection.Find.Execute($FindText)
$objWord.Selection.SelectRow()
Here's a lengthy example with explanations that does what you're looking for:
# open document
$objWord = New-Object -ComObject word.application
$objWord.Visible = $True
$doc = $objWord.Documents.Open("C:\temp\temp.docx")
# search for row
$FindText = "Value2"
$result = $objWord.Selection.Find.Execute($FindText)
$objWord.Selection.SelectRow()
# copy the ranges from searched row
$table = $objWord.Selection.Tables[1]
$copiedCells = $objWord.Selection.Cells | select columnindex,rowindex
# add row at the end of table
$table.Rows[$table.Rows.Count].Select()
$objWord.Selection.InsertRowsBelow(1)
# insert copied text into each column of last row
foreach ($cell in $copiedCells) {
# copy value from cell in copied row
$copiedText = $table.Cell($cell.rowindex, $cell.columnindex).Range.Text
# remove last 2 characters (paragraph and end-of-cell)
$TrimmedText = $copiedText.Remove($copiedText.Length - 2)
# set value for cell in last row
$table.Cell($table.Rows.Count,$cell.columnindex).Range.Text = $TrimmedText
}

Shell script - set autofilter on a existing xls-file

I want to set the filters on an existing .xls-file by running a shell script from the command line.
powershell -c "$excelObj = New-Object -ComObject Excel.Application;$excelWorkBook = $excelObj.Workbooks.Open(\"C:\Users\Desktop\Papierkorb\Test\test2.xlsx\");$excelWorkSheet = $excelObj.WorkSheets.item(\"Sheet1\");$excelWorkSheet.activate();$headerRange = $excelWorkSheet.Range(\"A1\",\"A1\").AutoFilter() | Out-Null;$excelWorkBook.Save();$excelWorkBook.Close();$excelObj.Quit()"
I am getting an error message:
Unable to get the AutoFilter property of the Range class
At line:1 char:231
I tried several adaptions with the Range, but could not fix it.
Thanks for your help,
It is not possible to set AutoFilter on a range without data.
Try to put some text into cell "A1" in the test2.xlsx file (either using Excel, or problematically with PowerShell, example below). You can even put empty string ''.
The following works for me.
$excelObj = New-Object -ComObject Excel.Application;$excelWorkBook = $excelObj.Workbooks.Open("d:\temp\test2.xlsx");
$excelWorkSheet = $excelObj.WorkSheets.item("Sheet1");
$excelWorkSheet.activate();
$headerRange = $excelWorkSheet.Range("A5","A5") ;
$headerRange.Item(1,1) = 'Something'
$headerRange.AutoFilter() ;
#$headerRange.AutoFilter() | Out-Null;
$excelWorkBook.Save();
$excelWorkBook.Close();
$excelObj.Quit()

How can PowerShell be used on Microsoft Word to get the page number that a hyperlink is found on?

I don't seem to find quite a bit of examples of PowerShell and Microsoft Word. I've seen plenty of how to post a page number in a footer, but I don't quite grasp the PowerShell select method or object. An example of how to count up pages in any particular set of documents was also reviewed.
I've dug through quite a few books and only one of them really had anything to do with PowerShell and MS Word. If anything only a few trivial Excel examples or how to create a word document was given. I also noticed that Office 365 is offered as a focus point of one book and even an online script building resource, but nothing like that I could find on Office 2013 and prior.
This is the script that I'm working with now which isn't really much to look at.
$objWord = New-Object -ComObject Word.Application;
$objWord.Visible = $false;
$objWord.DisplayAlerts = "wdAlertsNone";
# Create the selection object
$Selection = $objWord.Selection;
#$document = $objWord.documents.open("C:\Path\To\Word\Document\file.docx");
$hyperlinks = #($document.Hyperlinks);
#loop through all links in the word document
$hyperlinks | ForEach {
if($_.Address -ne $null)
{
# The character number where the hyperlink text starts
$startCharNumber = $_.Range.Start;
# The character number where the hyperlink text ends
$endCharNumber = $_.Range.End;
# Here is where to calculate which page number the $startCharNumber is found on. How exactly to do this?
# For viewing purposes only. To be used to create a report or index.
Write-Host "Text To Display: " $_.TextToDisplay " URL: " $_.Address " Page Num: " ;
}
}
$objWord.quit();
You can use Information(wdActiveEndPageNumber) to get the page containing the selection.
$word = [System.Runtime.InteropServices.Marshal]::GetActiveObject('Word.Application')
$wdActiveEndPageNumber = 3
$doc = $word.ActiveDocument
foreach ($h in $doc.Hyperlinks) {
$page = $h.Range.Information($wdActiveEndPageNumber)
echo "Page $page : $($h.Address)"
}
Edited following #bibadia's comment.

How to create a powerpoint table using powershell

I'm currently trying to create a calendar in powerpoint using powershell. All I want to do is insert a table into a powerpoint slide. This table is representing the month of January, it contains the days of the week etc.
I did some research and came across this.
This is VB script, so i tried to "create its equivalent" in powershell:
EDIT3: I was finally able to copy my table from Excel and paste it into my powerpoint slide using this code:
#Create an instance of Excel.
$xl=New-Object -ComObject "Excel.Application"
$xl.Visible = $True
#Open the Excel file containing the table.
$wb = $xl.workbooks.open("C:\January.xls")
$ws = $wb.ActiveSheet
#Select the table.
$range = $ws.range("A1:G7")
$range.select()
#Copy the table to the clipboard.
$range.copyPicture()
#Create an instance of Powerpoint.
$objPPT = New-Object -ComObject "Powerpoint.Application"
$objPPT.Visible ='Msotrue'
#Add a slide to the presentation.
$project = $objPPT.Presentations.Add()
$slide = $project.Slides.Add(1, 1)
#Paste the table into the slide.
$shape = $slide.Shapes.Paste()
#Position the table.
$shape.Left = 50
$shape.Top = 150
$shape.Width = 300
$shape.Height = 168
Thanks to those who have helped me here and on #powershell
I saw your question in the channel.
This worked for me:
$presentation = $ppt.Presentations.Open($ifile)
$sl = $presentation.Slides.Add(1, $ppLayoutBlank)
$shape = $sl.shapes.paste()
I think you can use this:
# Optionally, make sure you're on the last slide:
$ppt.ActiveWindow.View.GotoSlide( $ppt.ActivePresentation.Slides.Count )
# Specify the
$ppt.ActiveWindow.View.PasteSpecial( "ppPasteOLEObject", "msoFalse", "", 0, "", "msoFalse")
See the
MSDN Interop Docs
and thanks to this example:
Paste Excel Chart into Powerpoint using VBA

Read word document (*.doc) content with tables etc

I have a word document (2003). I am using Powershell to parse the content of the document.
The document contains a few lines of text at the top, a dozen tables with differing number of columns and then some more text.
I expect to be able to read the document as something like the below:
Read document (make necessary objects etc)
Get each line of text
If not part of a table, process as text and Write-Output
else
If part of a table
Get table number (by order) and parse output based on columns
end if
Below is the powershell script that I have begun to write:
$objWord = New-Object -Com Word.Application
$objWord.Visible = $false
$objDocument = $objWord.Documents.Open($filename)
$paras = $objDocument.Paragraphs
foreach ($para in $paras)
{
Write-Output $para.Range.Text
}
I am not sure if Paragraphs is what I want. Is there anything more suitable for my purpose?
All I am getting now is the entire content of the document. How do I control what I get. Like I want to get a line, be able to determine if it is part of a table or not and take an action based on what number table it is.
You can enumerate the tables in a Word document via the Tables collection. The Rows and Columns properties will allow you to determine the number of rows/columns in a given table. Individual cells can be accessed via the Cell object.
Example that will print the value of the cell in the last row and last column of each table in the document:
$wd = New-Object -ComObject Word.Application
$wd.Visible = $true
$doc = $wd.Documents.Open($filename)
$doc.Tables | ForEach-Object {
$_.Cell($_.Rows.Count, $_.Columns.Count).Range.Text
}