Is there a way in which I can programmatically access the document properties of a Word 2007 document?
I am open to using any language for this, but ideally it might be via a PowerShell script.
My overall aim is to traverse the documents somewhere on a filesystem, parse some document properties from these documents, and then collate all of these properties back together into a new Word document.
I essentially want to automatically create a document which is a list of all documents beneath a certain folder of the filesystem; and this list would contain such things as the Title, Abstract and Author document properties; the CreateDate field; etc. for each document.
Check Hey, Scripting Guy! How Can I Retrieve the Custom Properties of a Microsoft Word Document? and maybe Hey, Scripting Guy! How Can I Add Custom Properties to a Microsoft Word Document?. That could help you or at least give an idea ;)
I needed to do this in PowerShell running on a server without MS Office applications installed. The trick, as suggested above, is to peek inside the office file and examine the embedded xml files within.
Here's a function that runs like a cmdlet, meaning you can simply save the script in your PowerShell scripts directory and call the function from any other PowerShell script.
# DocumentOfficePropertiesGet
# Example usage
# From a PowerShell script:
# $props = Invoke-Expression "c:\PowerShellScriptFolder\DocumentOfficePropertiesGet.ps1 -DocumentFullPathName ""d:\documents\my excel doc.xlsx"" -OfficeProperties ""dcterms:created;dcterms:modified"""
# Parameters
# DocumentFullPathName -- full path and name of MS Office document
# OfficeProperties -- semi-colon delimited string of property names as they
# appear in the core.xml file. To see these names, rename any
# MS Office document file to have the extension .zip, then look inside
# the zip file. In the docProps folder open the core.xml file. The
# core document properties are nodes under the cp:coreProperties node.
# Example: dcterms:created;dcterms:modified;cp:lastModifiedBy
# Return value
# The function returns a hashtable object -- in the above example, $props would contain
# the name-value pairs for the requested MS Office document properties. In the calling script,
# to get at the values:
# $fooProperty = $props.'dcterms:created'
# $barProperty = $props.'dcterms:modified'
[CmdletBinding()]
[OutputType([System.Collections.Hashtable])]
Param
(
[Parameter(Position=0,
Mandatory=$false,
HelpMessage="Enter the full path name of the document")]
[ValidateNotNullOrEmpty()]
[String] $DocumentFullPathName='e:\temp\supplier_List.xlsx',
[Parameter(Position=1,
Mandatory=$false,
HelpMessage="Enter the Office properties semi-colon delimited")]
[ValidateNotNullOrEmpty()]
[String] $OfficeProperties='dcterms:created; dcterms:modified ;cp:lastModifiedBy;dc:creator'
)
# We need the FileSystem assembly
Add-Type -AssemblyName System.IO.Compression.FileSystem
# This function unzips a zip file -- and it works on MS Office files directly: no need to
# rename them from foo.xlsx to foo.zip. It expects the full path name of the zip file
# and the path name for the unzipped files
function Unzip
{
param([string]$zipfile, [string]$outpath)
[System.IO.Compression.ZipFile]::ExtractToDirectory($zipfile, $outpath) *>$null
}
# Remove spaces from the OfficeProperties parameter
$OfficeProperties = $OfficeProperties.replace(' ','')
# Compose the name of the folder where we will unzip files
$zipDirectoryName = $env:TEMP + "\" + "TempZip"
# delete the zip directory if present
remove-item $zipDirectoryName -force -recurse -ErrorAction Ignore | out-null
# create the zip directory
New-Item -ItemType directory -Path $zipDirectoryName | out-null
# Unzip the files -- i.e. extract the xml files embedded within the MS Office document
unzip $DocumentFullPathName $zipDirectoryName
# get the docProps\core.xml file as [xml]
$coreXmlName = $zipDirectoryName + "\docProps\core.xml"
[xml]$coreXml = get-content -path $coreXmlName
# create an array of the requested properties
$requiredProperties = $OfficeProperties -split ";"
# create a hashtable to return the values
$docProperties = #{}
# Now look for each requested property
foreach($requiredProperty in $requiredProperties)
{
# We will be lazy and ignore the namespaces. We need the local name only
$localName = $requiredProperty -split ":"
$localName = $localName[1]
# Use XPath to fetch the node for this property
$thisNode = $coreXml.coreProperties.SelectSingleNode("*[local-name(.) = '$localName']")
if($thisNode -eq $null)
{
# To the hashtable, add the requested property name and its value -- null in this case
$docProperties.Add($RequiredProperty, $null)
}
else
{
# To the hashtable, add the requested property name and its value
$docProperties.Add($RequiredProperty, $thisNode.innerText)
}
}
#clean up
remove-item $zipDirectoryName -force -recurse
# return the properties hashtable. To do this, just write the object to the output stream
$docProperties
My guess is that your best bet is VB or C# and the Office Interop Assemblies. I'm unaware of a native way (within Powershell) to do what you want.
That said, if you use VB or C#, you could write a powershell cmdlet to what you are the collation. But at that point, it might be more simple to just write a console app that runs as a scheduled task instead.
I recently learned from watching a DNRTV episode that Office 2007 documents are just zipped XML. Therefore, you can change "Document.docx" to "Document.docx.zip" and see the XML files within. You could probably get the properties via an interop assembly in .NET, but it may be more efficient to just look right into the XML (perhaps with LINQ to XML or some native way I am unaware of).
I wrote up how to do this back in the Monad beta days. It should still work I think.
Related
How to create an empty file with powershell, similar to "touch" on Linux, with a timestamp in the filename?
not too different from:
md5sum /etc/mtab > "$(date +"%Y_%m_%d_%I_%M_%p").log"
although that file isn't actually empty, but it does have the date incorporated into the filename itself.
Attempts on Powershell:
PS /home/nicholas/powershell/file_ops> New-Item -ItemType file foo.txt
New-Item: The file '/home/nicholas/powershell/file_ops/foo.txt' already exists.
New-Item: The file '/home/nicholas/powershell/file_ops/foo.txt' already exists.
PS /home/nicholas/powershell/file_ops> New-Item -ItemType file bar.txt
Directory: /home/nicholas/powershell/file_ops
Mode LastWriteTime Length Name
---- ------------- ------ ----
----- 12/20/2020 10:56 AM 0 bar.txt
PS /home/nicholas/powershell/file_ops> $logfile = "./"+$FN+"-LOG-AddUser_$(get-date -Format yyyymmdd_hhmmtt).txt"
ideally, to generate an arbitrary number of empty log or text files.
see also:
https://community.spiceworks.com/topic/1194231-powershell-adding-a-variable-into-a-log-filename
https://superuser.com/q/502374/977796
https://4sysops.com/archives/understanding-the-powershell-_-and-psitem-pipeline-variables/
https://unix.stackexchange.com/q/278939/101935
In the simplest case, if you want to unconditionally create a file, use New-Item -Force - but note that if the target file exists, its content is discarded:
# CAVEAT: Truncates an existing file. `-ItemType File` is implied.
# * Outputs a [System.IO.FileInfo] instance describing the new file, which
# $null = ... discards here.
# * `Get-Date -UFormat` allows you to perform Unix-style date formatting.
$null = New-Item -Force "$(Get-Date -UFormat "%Y_%m_%d_%I_%M_%p").log"
New-Item's (positionally implied) -Path parameter supports an array of paths, so you can pass multiple paths at once.
By default, an empty file is created, but you may optionally provide (initial) content via the -Value parameter.
More work is needed if you truly want to emulate the touch Unix utility's behavior, which by default means (note that touch supports a variety of options[1]):
If a file doesn't exist yet, create it (as an empty file).
Otherwise, update the last-modified timestamp to the current point in time (and leave the existing content alone).
$file = "$(Get-Date -UFormat "%Y_%m_%d_%I_%M_%p").log"
# Trick: This dummy operation leaves an existing file alone,
# but creates the file if it doesn't exist.
Add-Content -LiteralPath $file -Value $null
(Get-Item -LiteralPath $file).LastWriteTime = Get-Date
Note:
The above is limited to a single file specified by literal path, and doesn't include error handling.
See this answer for custom PowerShell function Touch-File, which implements most of the touch utility's functionality in PowerShell-idiomatic fashion, including the ability to handle wildcard patterns correctly.
Said function is also available as an MIT-licensed Gist. Assuming you have looked at the linked code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install it directly as follows:
irm https://gist.github.com/mklement0/82ed8e73bb1d17c5ff7b57d958db2872/raw/Touch-File.ps1 | iex
[1] The linked page is touch's POSIX spec, which mandates minimum functionality; concrete implementations may support more.
Having trouble with two VNC servers switching off MS Logon Groups being forced. I'm troubleshooting the issue, and one thing I want to do is monitor the config .ini file. I'm relatively new to PowerShell and can't quite get this to work.
Basically, I want the script to check the contents of the configuration file (ultravnc.ini) and see if "MSLogonRequired=1" is a string in that file. If not, I want to append the date to a log file. Eventually I'll do some more with this, but this is my basic need. It's not currently working.
# Variables
$outputFile = "vncMSLogonErrors.txt"
$vncConfig = "C:\Program Files (x86)\uvnc bvba\UltraVNC\ultravnc.ini"
$checkString = "MSLogonRequired=1"
# Get VNC Config File, check for MS Logon setting, write date to file if missing
Get-Content $vncConfig
If (-not $checkString)
{Add-Content $outputFile -Value $(Get-Date)}
Shamus Berube's helpful answer is conceptually simple and works well, if you can assume:
that the line of interest is exactly MSLogonRequired=1, with no variations in whitespace.
that if the INI file is subdivided into multiple sections (e.g, [admin]), that the key name MSLogonRequired is unique among the sections, to prevent false positives.
It is therefore generally preferable to use a dedicated INI-file-parsing command; unfortunately:
PowerShell doesn't come with one, though adding one is being debated
in the meantime you can use the popular PsIni third-party module (see this answer for how to install it and for background information):
Using the PsIni module's Get-IniContent function:
Note: Based on the UltraVNC INI-file documentation, the code assumes that the MSLogonRequired entry is inside the [admin] section of the INI file.
# Variables
$outputFile = "vncMSLogonErrors.txt"
$vncConfig = "C:\Program Files (x86)\uvnc bvba\UltraVNC\ultravnc.ini"
# Check the VNC Config File to see if the [admin] section's 'MSLogonRequired'
# entry, if present, has value '1'.
if ((Get-IniContent $vncConfig).admin.MSLogonRequired -ne '1') {
Add-Content $outputFile -Value (Get-Date)
}
# Variables
$outputFile = "vncMSLogonErrors.txt"
$vncConfig = "C:\Program Files (x86)\uvnc bvba\UltraVNC\ultravnc.ini"
$checkString = "MSLogonRequired=1"
if ((get-content $vncconfig) -notcontains $checkString)) { Add-Content $outputFile -Value $(Get-Date) }
REASONS WHY THIS IS NOT A DUPLICATE
Since 3 people have already voted to close, I guess I should explain why this question is not a duplicate:
I cannot use cat or >> as these mess up the encoding of the files, which are UTF8 on input and need to be UTF8-BOM on output.
The linked question does not show how to loop through all files that match a given pattern in a directory, and concatenate a single file to each of the matching files on output, plus give the new file a different extension.
Using Set-Content is not Powershell 6 future-proof, since Set-Content will NOT add a BOM marker. In Powershell 5 and below, it sometimes adds a BOM marker and sometimes not, depending on the configuration settings of the executing user. See 'quick note on encoding' at the end of this article.
So in conclusion I am looking for a solution that uses copy (hence the question title) and does NOT use Cat or Set-Content.
I need to loop through certain files in a given directory and run the following on each file:
copy /b BOMMarker.txt+InputFile.dat OutputFile.txt
This inserts the contents of the BOMMarker.txt file at the start of the InputFile.dat and writes the output to OutputFile.txt
I found this question which explains how I can loop through the folder to load each file into Powershell, but how do I apply the "copy /b" command so that I can get the BOM marker at the start of each file?
EDIT
The comment from Jeroen indicates I can just do Set-Content on the output file, as Powershell will automatically add the BOM at the start.
But I also need to change the extension. So the output filename needs to be the same as the input filename, just with a changed extension (from .dat to .txt) and including the BOM.
I am guessing I can use Path.ChangeExtension somehow to do this, but not sure how to combine that with also adding the BOM.
EDIT - for Bounty
The example answer I posted does not work in all environments I tested it, and I do not know why (possibly different default Powershell setttings) but also, it is not future proof since Powershell 6 will not output BOM by default.
From the given directory, I need to process all files that match the filter (DIL_BG_TXN*.dat).
For each of those files, I need to copy it with a BOM at the start but the resultant new file needs to be the same name but with the extension .txt instead of .dat.
This solutions uses streams, that reliably read and write as-is:
$bomStream = [IO.File]::OpenRead('BOMMarker.txt')
$location = "" # set this to the folder location
$items = Get-ChildItem -Path $location -Filter DIL_BG_TXN*.dat
foreach ($item in $items) {
$sourceStream = [IO.File]::OpenRead($item.FullName)
$targetStream = [IO.File]::OpenWrite([IO.Path]::ChangeExtension($item.FullName, '.txt'))
$bomStream.CopyTo($targetStream)
$sourceStream.CopyTo($targetStream)
$targetStream.Flush()
$targetStream.Close()
$sourceStream.Close()
$bomStream.Position = 0
}
$bomStream.Close()
Of course please write the absolute path of BOMMarker.txt (1st line) according to its location.
This finally worked:
$Location = "C:\Code\Bulgaria_Test"
$items = Get-ChildItem -Path $Location -Filter DIL_BG_TXN*.dat
ForEach ($item in $items) {
Write-Host "Processing file - " $item
cmd /c copy /b BOMMarker.txt+$item ($item.BaseName + '.txt')
}
Description:
Set the directory location where all the .dat files are.
Load only those files that match the filter into the array $items.
Loop through each $item in the array.
With each $item, call cmd shell with the copy /b command and concatenate the bom marker file with the $item file and write the result to the basename of $item plus the new extension.
I am wondering if anyone can give me some insight as to how to setup a script that will look at a folder on your computer (say C:\Windows\System32) and be able to look at the files within the folder to compare against a predefined Hash Table.
This Hash table will have information about identical files found in the "system32" folder and then locate the files that have wrong versions. The hash table will consist of Files and their versions.
Example:
advapi32.dll 6.1.7600.16385
comctl32.dll 6.1.7600.16385
comdlg32.dll 6.1.7600.16385
gdi32.dll 6.1.7600.16385
After files with wrong versions are found, the output would list the files in question in a table format.
I know I have to use [System.Diagnostics.FileVersionInfo] to find a method that helps get the version info for a file. I know I'd have to use the .productVersion property for my comparison.
I'm not sure how to get started. If there's more info you need, please let me know.
I'm lost as to where you're stuck. If you need the files in a folder, and you aren't googling "files in folder powershell" then ... what are you doing?
Your previous two questions and answers include using: if / equality testing, gci (get-childitem), using that to get a filename with path, calling .Net framework functions. That's mostly all you need, all they are missing is the hashtable and a loop:
$files = #(Get-ChildItem -File "C:\Windows\system32\") #NB. #() forces a one-item answer to still be an array
foreach ($file in $files) {
$fileVersion = [System.Diagnostics.FileVersionInfo]::GetVersionInfo($file.FullName).ProductVersion
$hashVersion = $hash[$file.Name]
if ($hashVersion -ne $fileVersion) {
write "$($file.Name) is different (file: $fileVersion, expected: $hashVersion)"
}
}
I'm trying to write a script in powershell to batch convert video files.
The way I intend to use it is to go to some folder full of video files and run it. It uses a conversion program that can be run in "command-line mode" (named handbrake) and saves the converted files with "-android" appended to them before the file extension. For example, if I have a file named video1.avi in the folder, after running the script the folder has 2 files: video1.avi and video1-android.avi
The reason I want to do this this way is so that I can check if, for each video file, there is a converted version of it (with -android appended to the name). And if so, skip the conversion for that file.
And this is where I'm having touble. The problem is the Test-Path's behavior (the cmdlet I'm using to test if a file exists).
What happens is, if the video file has an "unusual" name (for example in my case it's video[long].avi) Test-Path always returns False if you try to check if that file exists.
An easy way for you to test this is for example to do this:
Go to an empty folder,
run notepad to create a file with "[" in its name:
¬epad test[x].txt
Save the file
then do this:
Get-ChildItem | ForEach-Object {Test-Path $_.FullName}
It does not return true! It should right? Well it doesn't if the file has "[" in its name (I didn't check for any other special characters)
I've realized that if you escape the "[" and "]" it works
Test-Path 'test`[x`].txt'
returns true.
How can I go around this issue? I want to be able to: given a BaseName of a file, append it "-android.avi" and check if a file with that name exists.
Thanks,
Rui
Many PowerShell cmdlets have Path parameters that support wildcarding. As you have observed, in PowerShell not only is * a wildcard but [ and ] are also considered wildcard characters. You can read more about this in the help topic about_Wildcards.
For your issue, if you don't need wildcarding then I would recommend using the -LiteralPath parameter. This parameter doesn't support wildcarding and accepts [ and ] as literal path characters e.g.:
Get-ChildItem | ForEach {Test-Path -LiteralPath `
"$([io.path]::ChangeExtension($_.FullName,'avi'))"}
FYI, the reason piping the output of Get-ChildItem directly into Test-Path works is because the LiteralPath parameter has an alias "PSPath" that maps to the PSPath property on the FileInfo object output by Get-ChildItem. That property gets bound to the LiteralPath (er PSPath) parameter "by property name".
dir | % {test-path "$($_.Name)-android.avi"}