Powershell automated deletion of specified SharePoint documents - powershell

We have a csv file with approximately 8,000 SharePoint document file URLs - the files in question they refer to have to be downloaded to a file share location, then deleted from the SharePoint. The files are not located in the same sites, but across several hundred in a server farm. We are looking to remove only the specified files - NOT the entire library.
We have the following script to effect the download, which creates the folder structure so that the downloaded files are separated.
param (
[Parameter(Mandatory=$True)]
[string]$base = "C:\Export\",
[Parameter(Mandatory=$True)]
[string]$csvFile = "c:\export.csv"
)
write-host "Commencing Download"
$date = Get-Date
add-content C:\Export\Log.txt "Commencing Download at $date":
$webclient = New-Object System.Net.WebClient
$webclient.UseDefaultCredentials = $true
$files = (import-csv $csvFile | Where-Object {$_.Name -ne ""})
$line=1
Foreach ($file in $files) {
$line = $line + 1
if (($file.SpURL -ne "") -and ($file.path -ne "")) {
$lastBackslash = $file.SpURL.LastIndexOf("/")
if ($lastBackslash -ne -1) {
$fileName = $file.SpURL.substring(1 + $lastBackslash)
$filePath = $base + $file.path.replace("/", "\")
New-Item -ItemType Directory -Force -Path $filePath.substring(0, $filePath.length - 1)
$webclient.DownloadFile($file.SpURL, $filePath + $fileName)
$url=$file.SpURL
add-content C:\Export\Log.txt "INFO: Processing line $line in $csvFile, writing $url to $filePath$fileName"
} else {
$host.ui.WriteErrorLine("Exception: URL has no backslash on $line for filename $csvFile")
}
} else {
$host.ui.WriteErrorLine("Exception: URL or Path is empty on line $line for filename $csvFile")
}
}
write-Host "Download Complete"
Is there a way we could get the versions for each file?
I have been looking for a means to carry out the deletion, using the same csv file as reference - all of the code I have seen refers to deleting entire libraries, which is not desired.
I am very new to PowerShell and am getting lost. Can anyone shed some light?
Many thanks.

This looks like it might be useful. It's a different approach and would need to be modified to pull in the file list from your CSV but it looks like it generally accomplishes what you are looking to do.
https://sharepoint.stackexchange.com/questions/6511/download-and-delete-documents-using-powershell

Related

Powershell: Converting Headers from .msg file to .txt - Current directory doesn't pull header information, but specific directory does

So I am trying to make a script to take a batch of .msg files, pull their header information and then throw that header information into a .txt file. This is all working totally fine when I use this code:
$directory = "C:\Users\IT\Documents\msg\"
$ol = New-Object -ComObject Outlook.Application
$files = Get-ChildItem $directory -Recurse
foreach ($file in $files)
{
$msg = $ol.CreateItemFromTemplate($directory + $file)
$headers = $msg.PropertyAccessor.GetProperty("http://schemas.microsoft.com/mapi/proptag/0x007D001E")
$headers > ($file.name +".txt")
}
But when I change the directory to use the active directory where the PS script is being run from $directory = ".\msg\", it will make all the files into text documents but they will be completely blank with no header information. I have tried different variations of things like:
$directory = -Path ".\msg\"
$files = Get-ChildItem -Path $directory
$files = Get-ChildItem -Path ".\msg\"
If anyone could share some ideas on how I could run the script from the active directory without needing to edit the code to specify the path each location. I'm trying to set this up so it can be done by simply putting it into a folder and running it.
Thanks! Any help is very appreciated!
Note: I do have outlook installed, so its not an issue of not being able to pull the headers, as it works when specifying a directory in the code
The easiest way might actually be to do it this way
$msg = $ol.CreateItemFromTemplate($file.FullName)
So, the complete script would then look something like this
$directory = ".\msg\"
$ol = New-Object -ComObject Outlook.Application
$files = Get-ChildItem $directory
foreach ($file in $files)
{
$msg = $ol.CreateItemFromTemplate($file.FullName)
$headers = $msg.PropertyAccessor.GetProperty("http://schemas.microsoft.com/mapi/proptag/0x007D001E")
$headers > ($file.name +".txt")
}
All that said, it could be worthwhile reading up on automatic variables (Get-Help about_Automatic_Variables) - for instance the sections about $PWD, $PSScriptRoot and $PSCommandPath might be useful.
Alternative ways - even though they seem unnecessarily complicated.
$msg = $ol.CreateItemFromTemplate((Get-Item $directory).FullName + $file)
Or something like this
$msg = $ol.CreateItemFromTemplate($file.DirectoryName + "\" $file)

Powershell Writing to .XLSX is Corrupting the Files

I have a Powershell script that loops through .xslx files in a folder and password protects them with the file name (for now.) I have no problem looping through and writing to .xls, but when I try to open an .xlsx file after writing it with Powershell - I get the error:
Excel cannot open the file 'abcd.xlsx' because the file format or file
extension is not valid. Verify that the file has not been corrupted
and that the file extension matches the format of the file.
Here's the script:
function Release-Ref ($ref) {
([System.Runtime.InteropServices.Marshal]::ReleaseComObject(
[System.__ComObject]$ref) -gt 0)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
$e = $ErrorActionPreference
$ErrorActionPreference="continue"
foreach ($f in Get-ChildItem "C:"){
try{
$ff = $f
$xlNormal = -4143
$s = [System.IO.Path]::GetFileNameWithoutExtension($f)
$xl = new-object -comobject excel.application
$xl.Visible = $False
$xl.DisplayAlerts = $False
$wb = $xl.Workbooks.Open($ff.FullName)
$wb.sheets(1).columns("A:S").entirecolumn.AutoFit()
$wb.sheets(1).columns("N").NumberFormat = "0.0%"
$a = $wb.SaveAs("C:\Out\" + $s + ".xls",$xlNormal,$s) #works
#$a = $wb.SaveAs("C:\Out\" + $s + ".xlsx",$xlNormal,$s) #doesn't work
$a = $xl.Quit()
$a = Release-Ref($ws)
$a = Release-Ref($wb)
$a = Release-Ref($xl)
}
catch {
Write-Output "Exception"
$ErrorActionPreference=$e;
}
}
I've searched other questions but can't find any other examples of the same issues writing from Powershell. Thank you.
The problem is caused because Xls is different a format from Xlsx. Older Excels before version 2007 used binary formats. The 2007 Office introduced new formats called Office Open Xml, which Xslx uses.
Excel is smart enough to check both file extension and file format. Since saving a binary file with new versions' extension creates a conflict, the error message hints for this possibility too:
and that the file extension matches the format of the file.
Why doesn't Excel just open the file anyway? I guess it's a security feature that prevents unintentional opening of Office documents. Back in the days, Office macro viruses were a bane of many offices. One of the main infection vectors was to trick users to open files without precautions. Unlike classic viruses, macro ones infected application data (including default template files) instead of OS binaries, but that's another a story.
Anyway, to work in proper a format, use proper version value. That would be -4143 for Xls and 51 for Xlsx. What's more, Get-ChildItem returns a collection of FileInfo objects, and file extension is there in Extension property. Like so,
# Define Xls and Xlsx versions
$typeXls = -4143
$typeXls = 51
foreach ($f in Get-ChildItem "C:"){
try{
$ff = $f
...
# Select saveas type to match original file extension
if($f.extension -eq '.xsl') { $fType = $typeXls }
else if($f.extension -eq '.xslx') { $fType = $typeXlsx }
$a = $wb.SaveAs("C:\Out\" + $s + $.extension, $fType, $s)
Working with com objects is too complicated sometimes with excel. I recommend the import-excel module.
Install-Module -Name ImportExcel
Then you can do something like this.
function Release-Ref ($ref) {
$e = $ErrorActionPreference
$ErrorActionPreference="continue"
foreach ($f in Get-ChildItem $file){
try{
$filePass = gci $f
$path = split-path $f
$newFile = $path + "\" + $f.BaseName + "-protected.xlsx"
$f | Export-excel $newFile -password $filePass -NoNumberConversion * -AutoSize
}
catch {
Write-Output "Exception"
$ErrorActionPreference=$e;
}
}
}

Word com object failing

SCRIPT PURPOSE
The idea behind the script is to recursively extract the text from a large amount of documents and update a field in an Azure SQL database with the extracted text. Basically we are moving away from Windows Search of document contents to an SQL full text search to improve the speed.
ISSUE
When the script encounters an issue opening the file such as it being password protected, it fails for every single document that follows. Here is the section of the script that processes the files:
foreach ($list in (Get-ChildItem ( join-path $PSScriptRoot "\FileLists\*" ) -include *.txt )) {
## Word object
$word = New-Object -ComObject word.application
$word.Visible = $false
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], "wdFormatText")
$word.DisplayAlerts = 0
Write-Output ""
Write-Output "################# Parsing $list"
Write-Output ""
$query = "INSERT INTO tmp_CachedText (tCachedText, tOID)
VALUES "
foreach ($file in (Get-Content $list)) {
if ($file -like "*-*" -and $file -notlike "*~*") {
Write-Output "Processing: $($file)"
Try {
$doc = $word.Documents.OpenNoRepairDialog($file, $false, $false, $false, "ttt")
if ($doc) {
$fileName = [io.path]::GetFileNameWithoutExtension($file)
$fileName = $filename + ".txt"
$doc.SaveAs("$env:TEMP\$fileName", [ref]$saveFormat)
$doc.Close()
$4ID = $fileName.split('-')[-1].replace(' ', '').replace(".txt", "")
$text = Get-Content -raw "$env:TEMP\$fileName"
$text = $text.replace("'", "''")
$query += "
('$text', $4ID),"
Remove-Item -Force "$env:TEMP\$fileName"
<# Upload to azure #>
$query = $query.Substring(0,$query.Length-1)
$query += ";"
Invoke-Sqlcmd #params -Query $Query -ErrorAction "SilentlyContinue"
$query = "INSERT INTO tmp_CachedText (tCachedText, tOID)
VALUES "
}
}
Catch {
Write-Host "$($file) failed to process" -ForegroundColor RED;
continue
}
}
}
Remove-Item -Force $list.FullName
Write-Output ""
Write-Output "Uploading to azure"
Write-Output ""
<# Upload to azure #>
Invoke-Sqlcmd #params -Query $setQuery -ErrorAction "SilentlyContinue"
$word.Quit()
TASKKILL /f /PID WINWORD.EXE
}
Basically it parses through a folder of .txt files that contain x amount of document paths, creates a T-SQL update statement and runs against an Azure SQL database after each file is fully parsed. The files are generated with the following:
if (!($continue)) {
if ($pdf){
$files = (Get-ChildItem -force -recurse $documentFolder -include *.pdf).fullname
}
else {
$files = (Get-ChildItem -force -recurse $documentFolder -include *.doc, *.docx).fullname
}
$files | Out-File (Join-Path $PSScriptRoot "\documents.txt")
$i=0; Get-Content $documentFile -ReadCount $interval | %{$i++; $_ | Out-File (Join-Path $PSScriptRoot "\FileLists\documents_$i.txt")}
}
The $interval variable defines how many files are set to be extracted for each given upload to azure. Initially i had the word object being created outside the loop and never closed until the end. Unfortunately this doesn't seem to work as every time the script hits a file it cannot open, every file that follows will fail, until it reaches the end of the inner foreach loop foreach ($file in (Get-Content $list)) {.
This means that to get the expected outcome i have to run this with an interval of 1 which takes far too long.
This is a shot in the dark
But to me it sounds like the reason its failing is because the Word Com object is now prompting you for some action due since it cannot open the file so all following items in the loop also fail. This might explain why it works if you set the $Interval to 1 because when its 1 it is closing and opening the Com object every time and that takes forever (I did this with excel).
What you can do is in your catch statement, close and open a new Word Com object which should lets you continue on with the loop (but it will be a bit slower if it needs to open the Com object a lot).
If you want to debug the problem even more, set the Com object to be visible, and slowly loop through your program without interacting with Word. This will show you what is happening with Word and if there are any prompts that are causing the application to hang.
Of course, if you want to run it at full speed, you will need to detect which documents you can't open before hand or you could multithread it by opening several Word Com objects which will allow you to load several documents at a time.
As for...
ISSUE
When the script encounters an issue opening the file such as it being password protected, it fails for every single document that follows.
... then test for this as noted here...
How to check if a word file has a password?
$filename = "C:\path\to\your.doc"
$wd = New-Object -COM "Word.Application"
try {
$doc = $wd.Documents.Open($filename, $null, $null, $null, "")
} catch {
Write-Host "$filename is password-protected!"
}
... and skip the file to avoid the failure of the remaining files.

Extracting text from word documents

TASK
Extract text from .doc, .docx and .pdf files and upload content to an Azure SQL database. Needs to be fast as its running over millions of documents.
ISSUES
Script starts to fail if one of the documents has an issue. Some that i have come across are:
This file failed to open last time you tried - Open readonly
File is corrupt
SCRIPT
Firstly i generate a list of files containing 100 file paths. This is so i can continue execution if i need to stop it and / or it errors out:
## Word object
if (!($continue)) {
$files = (Get-ChildItem -force -recurse $documentFolder -include *.doc, *.docx).fullname
$files | Out-File (Join-Path $PSScriptRoot "\documents.txt")
$i=0; Get-Content $documentFile -ReadCount 100 | %{$i++; $_ | Out-File (Join-Path $PSScriptRoot "\FileLists\documents_$i.txt")}
}
Then i create the ComObject with the DisplayAlerts flag set to 0 (i thought this would fix it. It didnt)
$word = New-Object -ComObject word.application
$word.Visible = $false
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], "wdFormatText")
$word.DisplayAlerts = 0
After this, I loop through each file in each list, save the file as .txt to the temp folder, extract the text and generate an SQL INSERT statemnt:
foreach ($file in (Get-Content $list)) {
Try {
if ($file -like "*-*") {
Write-Output "Processing: $($file)"
$doc = $word.Documents.Open($file)
$fileName = [io.path]::GetFileNameWithoutExtension($file)
$fileName = $filename + ".txt"
$doc.SaveAs("$env:TEMP\$fileName", [ref]$saveFormat)
$doc.Close()
$4ID = $fileName.split('-')[-1].replace(' ', '').replace(".txt", "")
$text = Get-Content -raw "$env:TEMP\$fileName"
$text = $text.replace("'", "")
$query += "
('$text', $4ID),"
Remove-Item -Force "$env:TEMP\$fileName"
}
<# Upload to azure #>
$query = $query.Substring(0,$query.Length-1)
$query += ";"
$params = #{
'Database' = $TRIS5DATABASENAME
'ServerInstance' = $($AzureServerInstance.FullyQualifiedDomainName)
'Username' = $AdminLogin
'Password' = $InsecurePassword
'query' = $query
}
Invoke-Sqlcmd #params -ErrorAction "continue"
$query = "INSERT INTO tmp_CachedText (tCachedText, tOID)
VALUES "
}
Catch {
Write-Host "$($file) failed to process" -ForegroundColor RED;
}
}
Remove-Item -Force $list.FullName
ISSUES
As stated above, if something is wrong with one of the files or the document failed to open properly off a previous run the script starts failing. Everything in the loop throws errors, starting with:
You cannot call a method on a null-valued expression.
At D:\OneDrive\Scripts\Microsoft Cloud\CachedText-Extraction\CachedText-Extraction.ps1:226 char:13
+ $doc = $word.Documents.Open($file)
Basically what i want is a way to stop those errors from appearing by simply skipping the file if it has an error with the document. Alternatively, if there is a better way to extract text from document files using PowerShell and not using word that would be good too.
An example of one of the error messages:
This causes the file to be locked and execution to pause. The only way to get around it is to kill word, which then causes the rest of the script to fail.

Powershell downloading all possible files for a given day

I'm using Powershell for the first time to download the previous day's files from a webpage for a client. The web page is from a data logger than is on a vendor skid. The data logger always saves the files in the format yyMMdd##.CSV, where ## is the sequential number file for that given day (starting at 00). When viewing the webpage I have only seen the maximum number of CSV files for a given day as 1 (so, 8/31/17's file would be 17083100.CSV). I have got the Powershell code written to give me yesterday's file assuming that 00 is the only file for that day, but I was hoping there was a way I could either use a wildcard or for loop to download any additional files that may exist for the previous day. See the code below for what I currently have:
$a = "http://10.109.120.101/logs/Log1/"
$b = (get-date).AddDays(-1).ToString("yyMMdd") + "00.CSV"
$c = "C:\"
$url = "$a$b"
$WebClient = New-Object net.webclient
$path = "$c$b"
$WebClient.DownloadFile($url, $path)
try Something like this:
$Date=(get-date).AddDays(-1).ToString("yyMMdd")
$URLFormat ='http://10.109.120.101/logs/Log1/{0}{1:D2}.CSV'
$WebClient = New-Object net.webclient
#build destination path
$PathDest="C:\Temp\$Date"
New-Item -Path $PathDest -ItemType Directory -ErrorAction SilentlyContinue
1..99 | %{
$Path="$PathDest\{0:D2}.CSV" -f $_
$URL=$URLFormat -f $Date, $_
try
{
Write-Host ("Try to download '{0}' file to '{1}'" -f $URL, $Path)
$WebClient.DownloadFile($Path, $URL)
}
catch
{
}
}
$WebClient.Dispose()