Upgrading from iTextSharp to iText7 - powershell

I'm in the process of updating our scripts to ensure they remain functional, and discovered iText7 has replaced iTextSharp. My needs are simple; read form fields. Rather, I know how to read a form field, I'm just checking to see if there's a more streamlined way to do it, as it seems like it was easier in iTextSharp.
Here's the old code we're using with iTextSharp (the $form is being fed to the $reader via a foreach loop):
#create pdf reader object and load form
$reader = New-Object iTextSharp.text.pdf.PdfReader -ArgumentList $form.PSPath.Replace("Microsoft.PowerShell.Core\FileSystem::","")
#Get the data I need
$First = $reader.AcroFields.GetField("FirstName")
Simple. When playing with iText7 though, it seems to lose its simplicity. Here's what I have for iText7:
#Create pdf reader and load form
$Reader = [iText.Kernel.Pdf.PdfReader]::new("C:\temp\TestForm.pdf")
#Create PDFDoc object?
$PdfDoc = [iText.Kernel.Pdf.PdfDocument]::new($Reader)
#What? Why?
$Form = [iText.Forms.PdfAcroForm]::getAcroForm($PdfDoc, $True)
#Get the data I need. Oh wait, I am unable to read it.
$fName = $Form.GetField("FirstName")
#Finally...
$First = $fName.GetValue()
I'm afraid I don't have any luck researching simple code; everyone seems to be creating web forms on the fly, or parsing thousands of PDFs for data analytics. I'm also just a lowly SysAdmin, not a dev. Please tell me there's an easier way to read a single form field in iText7. Thanks in advance!

The simplicity is not necessarily measured by the number of lines of code. Your way of reading form fields in iText 7 is correct. The reason you need a couple of more lines is that iText 7 has a much clearer separation of different parts of the code across modules. This has big advantages compared to iText 5 and gives a greater room for flexibility in user code.
Inability to call $Form.GetField("FirstName").GetValue() is a PowerShell limitation by the way and has nothing to do with iText - you are able to use that kind of chaining in C# or Java.

Related

HtmlWebResponseObject.ParsedHtml replacement in Powershell Core 6

My goal is to parse an html file retrieved with Invoke-WebRequest. If possible I'd like to avoid any external libraries.
The problem I am facing is, that Invoke-WebRequest returns a BasicHtmlWebResponseObject instead of a HtmlWebResponseObject since Powershell 6. The Basic version misses the ParsedHtml property. Is there a good alternative to parse html in Powershell Core 6?
I've tried to use Select-Xml but my html is not entirely valid (e.g. a missing closing tag), hence this fails to parse the result.
Another alternative I've found is to use New-Object -ComObject "HTMLFile" but from my understanding this relies on Internet Explorer for parsing which I'd like to avoid.
There is a very similar question here but sadly this question had no answer or activity since 8 months.
As mentioned in the comments it is not really possible without a library. One very good library you could use it the AngleSharp library for dotnet. It has great html parsing capabilities and dotnet code interacts very friendly with powershell, have a look at this link.
Here is an example from their website:
var config = Configuration.Default.WithDefaultLoader();
var address = "https://en.wikipedia.org/wiki/List_of_The_Big_Bang_Theory_episodes";
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(address);
var cellSelector = "tr.vevent td:nth-child(3)";
var cells = document.QuerySelectorAll(cellSelector);
var titles = cells.Select(m => m.TextContent);

Getting OLE Exception while closing document in Perl

I am getting following exception in perl. Also i am now to perl technology.
Exception is :
Win32::OLE<0.1709> error 0x800a1423
in METHOD/PROPERTYGET "Close" at getWordComments.pl line no 350
here is the sample code of getWordComments.pl where exception is comming.
A) Following code for opening the document
#Open the document in MS Word
use Win32::OLE;
{
no warnings;
use Win32::OLE::Const 'Microsoft.Word'; # wd constants
}
$word=Win32::OLE->new('Word.Application');
$word->{Visible} = 1;
$word->{DisplayAlerts} = 0;
$Document=$word->Documents->Open({Filename => $filename, ReadOnly => 1});
B) Then i am reading the comment.
C) Following code for Closing the document.
$Document->{Saved}=1;
$Document->Close;
undef $Document;
#Close Word
$word->Quit;
undef $word;
is this problem with office version?
because document is with .docx. its working properly for .doc.
Please help me to solve this issue.
I am reading the comment form the document and saving the document on server. Its working fine for rest of the document with extension *.docx and *.doc
Also can you please provide me like how i can do this in perl.
i want to close the document for 2003 office and 2007 office version.
Does this is Version issue?
Thanks and regards
Arvind Porlekar
Wait! You're opening it ReadOnly and then marking it as Saved?? That right there throws flags in my mental processor.
The documentation that I can find seems to indicate that this is an issue regarding saving to a different format. That might account for the it-works-in-one-but-not-the-other case.
Also, I've seen indications that this is a COM error. It helps to know something about COM. Likely doc and docx are completely different implementations of the same interface defined by the previous doc logic. And it might be the case that the older implementation (doc) is okay with saying that you want to open it ReadOnly, but then wanting to mark it as saved, while the new implementation has the idea that you really should not do this.
As you can see here, one of the arguments handled is OriginalFormat, and it could be that if you don't specify that argument it defaults to a doc format, which then throws the exception that you are trying to save in a different format without explicit instructions. As well another of the arguments is SaveChanges.
So it could be that you are implicitly telling it to save changes in a default doc format, which works in the the doc format, but complains about trying to save it in a different format in the docx format. (understandably)

how to stop macros running when opening a Word document using OLE Interop?

As the title suggests, I have a .Net application which uses interop to open documents in Word. I have set
app.AutomationSecurity = Microsoft.Office.Core.MsoAutomationSecurity.msoAutomationSecurityForceDisable
before opening the document. According to the documentation, thhis "Disables all macros in all files opened programmatically, without showing any security alerts"
However, when I attempt to open one specific document I get a dialog box on the screen that says "could not load an object because it is not available on this machine". It's a customer document but I believe it contains a macro with references to a COM object which I don't have installed.
Am I doing something stupid? is there any way to actually disable macros when opening a Word document?
Try:
WordBasic.DisableAutoMacros 1
Bizarrely, this relies on a throwback to pre-VBA days, but still seems to be the most-reliable way to ensure that no auto macros are triggered (in any document - you may want to turn it back using the parameter "0").
I recently had a project where I had to process 6,000 Word templates (yes, templates, not documents) many of which had oddball stuff like macros, etc. I was able to process all but 6 using this technique. (I never did figure out what the problem was with those 6).
EDIT: for a discussion of how to call this from C#, see: http://www.dotnet247.com/247reference/msgs/56/281785.aspx
For c# you can use
(_wordApp.WordBasic as dynamic).DisableAutoMacros();
The whole code I'm using is:
using Word = Microsoft.Office.Interop.Word;
private Word.Application _wordApp;
...
_wordApp = new Word.Application
{
Visible = false,
ScreenUpdating = false,
DisplayAlerts = Word.WdAlertLevel.wdAlertsNone,
FileValidation = MsoFileValidationMode.msoFileValidationSkip
};
_wordApp.Application.AutomationSecurity = MsoAutomationSecurity.msoAutomationSecurityForceDisable;
(_wordApp.WordBasic as dynamic).DisableAutoMacros();

How can i add endnotes to a word doc using Powershell?

Hello I'm looking for a way to search for a word in a word doc and add an endnote(special type of footnote) with a definition of the word as the endnote text. This would allow me to hover over that word and then the definition would pop up like a tool tip.
I know i need to use reflection, but i'm new to the whole reflection thing and all my attempts have fallen flat.
I've found the reference for endnotes here: http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.endnotes.add%28office.11%29.aspx
I've tried loading C:\WINDOWS\Assembly\Gac\Microsoft.Office.Interop.Word\11.0.0.0__71e9bce111e9429c\Microsoft.Office.Interop.Word.dll using reflection, but i don't know what to do once i've loaded it. When i try to create an new-object, it still asks me if i've loaded the appropriate dll.
Additionally i tried to fix the problem with a diff method by loading the MS word application as a comobject, but i wasn't able to figure out how to select the text i wanted and then set and endnote.
Any suggestions for this would be greatly appreciated!
-Skyler
I am not too familiar with the Word object model, but if you can handle that part I can tell you how to get an instance of Word running and automated. It's quite simple actually.
$Application = New-Object -ComObject Word.Application
$Application.Visible = $true
$Document = $Application.Documents.Add()
The key is Visible = $true otherwise it will be running but hidden. Now you can use all the methods of the Word Application object to create a new doc and automate it. Now if you're using Word 2007's docx format, you can investigate ZIP file extraction cmdlets and access the xml directly in the word doc. But dealing with namespaces in XML is a hassle and may not be as straightforward.
Word Object Model Stuff
ScriptingGuy recently posted a solution to this: http://blogs.technet.com/heyscriptingguy/archive/2009/10/14/hey-scripting-guy-october-14-2009.aspx

Add a "Hyperlink" item type to a list using PowerShell in Sharepoint

I've been a SharePoint admin for a while, and now have been tasked with a bit more of a developer role - which I'm still very much learning. Most things I've been able to figure out on my own or through Google, but this one has me stumped.
For one particular task I need to use PowerShell to script adding items to a list. Normally - not a difficult task. These steps are all over the web. However, I have yet to find anywhere that will tell you how to add a "Hyperlink" type of item to a list.
I can add one using the following code:
$NewItem = $MyList.Items.Add()
$NewItem["My Hyperlink Column"] = $($url.url)
$NewItem.Update()
But I want to set the name/title of the link as well and that's what stumps me. I don't want to have to create a separate column in the list and populate that with the link name, and use code similar to above to populate the url/link.
Does this work for you? I don't have a Sharepoint install available to test on, this is from memory:
$NewItem = $MyList.Items.Add()
$NewItem["My Hyperlink Column"] = "$($url.url), <Title>"
$NewItem.Update()
james
Thanks James! That was very close and I'm thinking would work if I was specifying a single item?
Here's my full solution (with some extra bits):
$enumsite = new-object microsoft.sharepoint.spsite($SubWebUrl)
foreach ($url in $enumsite.allwebs)
{
$NewItem = $MyList.Items.Add()
$NewItem["My Hyperlink Column"] = "$($url.url), $(url.title)"
$NewItem.Update()
}
$enumsite.Dispose()
Perhaps this will help someone else out in the future.