Powershell - Applying a looped function to subdirectories - powershell

I am new to Powershell and am struggling a bit. I have obtained an example of the sort of function I want to use and adapted it partially. What I want is for it to loop through each subdirectory of C:\Test\, and combine just the PDFs in each subdirectory together (leaving the resulting PDF in each subdirectory).
At the moment I can get it to comb through the subdirectories, but it then combines the contents of all subdirectories into one giant PDF in the top level directory, which is not what I want. I feel like maybe I need to use an array of sorts but I don't know Powershell well enough yet.
BTW this uses PDFSharp - a .Net library.
Function PDFCombine {
$filepath = 'C:\Test\'
$filename = '.\Combined' #<--- ???
$output = New-Object PdfSharp.Pdf.PdfDocument
$PdfReader = [PdfSharp.Pdf.IO.PdfReader]
$PdfDocumentOpenMode = [PdfSharp.Pdf.IO.PdfDocumentOpenMode]
foreach($i in (gci $filepath *.pdf -Recurse)) {
$input = New-Object PdfSharp.Pdf.PdfDocument
$input = $PdfReader::Open($i.fullname, $PdfDocumentOpenMode::Import)
$input.Pages | %{$output.AddPage($_)}
}
$output.Save($filename)
}

Your question was unclear about how many levels you need to go down. You can try this (untested). It goes one level down from $filepath, gets all pdf files in that folder and it's subfolders and combines them into Subfoldername-Combined.pdf:
Function PDFCombine {
$filepath = 'C:\Test\'
$PdfReader = [PdfSharp.Pdf.IO.PdfReader]
$PdfDocumentOpenMode = [PdfSharp.Pdf.IO.PdfDocumentOpenMode]
#Foreach subfolder(FIRST LEVEL ONLY!)
Get-ChildItem $filepath | Where-Object { $_.PSIsContainer } | Foreach-Object {
#Create new ouput pdf-file
$output = New-Object PdfSharp.Pdf.PdfDocument
$outfilepath = Join-Path $_.FullName "$($_.Name)-Combined.pdf"
#Find and add pdf files in subfolders
Get-ChildItem -Path $_.FullName -Filter *.pdf -Recurse | ForEach-Object {
#$input = New-Object PdfSharp.Pdf.PdfDocument #Don't think this one's necessary
$input = $PdfReader::Open($_.fullname, $PdfDocumentOpenMode::Import)
$input.Pages | %{ $output.AddPage($_) }
}
#Save
$output.Save($outfilepath)
}
}
So you should get this:
c:\Test\Folder1\Folder1-Combined.pdf #should include all pages in Folder1 and ANY subfolders below)
c:\Test\Folder2\Folder2-Combined.pdf #should include all pages in Folder2 and ANY subfolders below)
#etc.
If you need it to create a combined pdf for every subfolder(not only the first level), then you could try this(untested):
Function PDFCombine {
$filepath = 'C:\Test\'
$PdfReader = [PdfSharp.Pdf.IO.PdfReader]
$PdfDocumentOpenMode = [PdfSharp.Pdf.IO.PdfDocumentOpenMode]
#Foreach subfolder with pdf files
Get-ChildItem -Path $filepath -Filter *.pdf -Recurse | Group-Object DirectoryName | ForEach-Object {
#Create new ouput pdf-file
$output = New-Object PdfSharp.Pdf.PdfDocument
$outfilepath = Join-Path $_.Name "Combined.pdf"
#Find and add pdf files in subfolders
$_.Group | ForEach-Object {
#$input = New-Object PdfSharp.Pdf.PdfDocument #I don't think you need this
$input = $PdfReader::Open($_.fullname, $PdfDocumentOpenMode::Import)
$input.Pages | %{ $output.AddPage($_) }
}
#Save
$output.Save($outfilepath)
#Remove output-object
Remove-Variable output
}
}

not tested ...
Function PDFCombine {
$filepath = 'C:\Test\'
$filename = '.\Combined' #<--- ???
$output = New-Object PdfSharp.Pdf.PdfDocument
$PdfReader = [PdfSharp.Pdf.IO.PdfReader]
$PdfDocumentOpenMode = [PdfSharp.Pdf.IO.PdfDocumentOpenMode]
$lastdir=""
foreach($i in (gci $filepath *.pdf -Recurse)) {
$input = New-Object PdfSharp.Pdf.PdfDocument
$input = $PdfReader::Open($i.fullname, $PdfDocumentOpenMode::Import)
$input.Pages | %{$output.AddPage($_)}
if ($lastdir -ne $_.directoryname){
$lastdir=$_.directoryname
$output.Save("$lastdir\$filename")
$output = New-Object PdfSharp.Pdf.PdfDocument
}
}
}

Related

Calling a Function During Array Creation to export contents of folder containing Zip files

I am having trouble trying to call a function for a script that I'm using to build a list of zip files in a folder on my PC. The final CSV I need is to create is a list of the zip files with their uncompressed sizes. Here is what I have so far (compiled from several posts):
Function to get the uncompressed size:
function Get-UncompressedZipFileSize {
param (
$Path
)
$shell = New-Object -ComObject shell.application
$zip = $shell.NameSpace($Path)
$size = 0
foreach ($item in $zip.items()) {
if ($item.IsFolder) {
$size += Get-UncompressedZipFileSize -Path $item.Path
} else {
$size += $item.size
}
}
[System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$shell) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
return $size
}
Here is my Array Creation:
$arr = #()
gci C:\zips -recurse | ? {$_.PSIsContainer -eq $False} | % {
$obj = New-Object PSObject
$obj | Add-Member NoteProperty Name $_.Name
$obj | Add-Member NoteProperty FullPath $_.FullName
$arr += $obj
}
$arr | Export-CSV -notypeinformation c:\zips
I'm stuck at creating a new member object into my array that will call the get-uncompressedzipfilesize function to pass that size back into the array as a new column in my zip. Is something like this even possible.?
Here is an alternative using ZipFile Class. The SizeConvert Class is inspired from this answer. The output of the Get-ZipFileSize would be the absolute path of the Zip File, its compressed and expanded size and its formatted friendly sizes (i.e.: 7.88 MB instead of 8262942).
using namespace System.IO
using namespace System.IO.Compression
using namespace System.Linq
function Get-ZipFileSize {
[cmdletbinding()]
param(
[parameter(ValueFromPipelineByPropertyName)]
[string] $FullName
)
begin {
if(-not $IsCoreCLR) {
Add-Type -AssemblyName System.IO.Compression.FileSystem
}
class SizeConvert {
static [string[]] $Suffix = "B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"
static [string] ToFriendlySize([int64] $Length, [int] $DecimalPoints) {
$idx = 0
while ($Length -ge 1kb) {
$Length /= 1kb
$idx++
}
return '{0} {1}' -f [math]::Round($Length, $DecimalPoints), [SizeConvert]::Suffix[$idx]
}
}
}
process {
try {
$zip = [ZipFile]::OpenRead($FullName)
$expanded = [Enumerable]::Sum([Int64[]] $zip.Entries.Length)
$compressed = [Enumerable]::Sum([int64[]] $zip.Entries.CompressedLength)
[pscustomobject]#{
FilePath = $FullName
RawExpanded = $expanded
RawCompressed = $compressed
FormattedExpanded = [SizeConvert]::ToFriendlySize($expanded, 2)
FormattedCompressed = [SizeConvert]::ToFriendlySize($compressed, 2)
}
}
catch {
$PSCmdlet.WriteError($_)
}
finally {
if($zip -is [System.IDisposable]) {
$zip.Dispose()
}
}
}
}
Get-ChildItem -Filter *.zip -Recurse | Get-ZipFileSize | Export-Csv ....
To make this simpler, since you're only calling your function to get the size of the current folder (zip), you can use a Calculated Property for this:
$Path = "C:\Zips"
Get-ChildItem -Path $Path -Directory -Recurse |
Select-Object -Property Name, FullName,
#{
Name = "Size"
Expression = {
Get-UncompressedZipFileSize -Path $_.FullName
}
} | Export-Csv -Path "$Path\zip.csv" -Force -NoTypeInformation -Append
On another note, if you ever find yourself explicitly adding to an array, take advantage of PowerShell's pipeline streaming.
$Path = "C:\Zips"
Get-ChildItem -Path $Path -Directory -Recurse |
ForEach-Object -Process {
[PSCustomObject]#{
Name = $_.Name
FullPath = $_.FullName
Size = Get-UncompressedZipFileSize -Path $_.FullName
} | Export-Csv -Path "$Path\zip.csv" -Force -NoTypeInformation -Append
}
Not only is adding to a fixed array (+=) computationally expensive (if you have a large directory), it is slow. Fixed arrays mean just that, they are a fixed size and in order for you to add to it, it needs to be broken down and recreated. An alternate solution would by an arraylist but, in this case - and in most cases - it's not needed.
Get-ChildItem also includes a -Directory switch to search for just folders. Presented in V3.
I would recommend searching for the file extension of the compressed folders as well so you don't run into any issues using -Filter.

Get directory name, folder name and count of all files

I am trying to create a script to read each folder name in a directory, count of zip files in each folder and then count of files in each zip. The output need to be written in an output file.
I came up with below:
$ZipRoot = 'C:\Users\Main Folder'
$ZipFiles = Get-ChildItem -Path $ZipRoot -Recurse -Filter '*.zip'
$Shell = New-Object -ComObject Shell.Application
$Results = foreach( $ZipFile in $ZipFiles ){
$FileCount = $Shell.NameSpace($ZipFile.FullName).Items() |
Measure-Object |
Select-Object -ExpandProperty Count
[pscustomobject]#{
FullName = $ZipFile.FullName
FileCount = $FileCount
}
}
$Results |
Export-Csv -Path 'C:\Users\mlkstq\Desktop\FFNS\ZipReport.csv' -NoTypeInformation
Output
Fullname Filecount
C:\Users\Main Folder\Subfolder1\Zip1 3
C:\Users\Main Folder\Subfolder2\Zip2 5
The problem is that I am having trouble getting the Subfolder name in putput file. Also want to substring subfolder name to get valid name. Whatever i try it fails.
If I've got you right I'd do it this way:
$ZipRoot = 'C:\Users\Main Folder'
$Shell = New-Object -ComObject Shell.Application
$subFolderList = Get-ChildItem -Path $ZipRoot -Recurse -Directory
$Result = foreach ($subFolder in $subFolderList) {
$zipFileList = Get-ChildItem -Path $subFolder.FullName -File -Filter *.zip
foreach ($ZipFile in $zipFileList) {
[PSCustomObject]#{
subFolder = $subFolder.FullName
zipFilesCount = $zipFileList.Count
zipFile = $ZipFile.Name
fileCount = $Shell.NameSpace($zipFile.FullName).Items().Count
}
}
}
Format-Table -InputObject $Result -AutoSize -InputObject $Result
Export-Csv -InputObject $Result -Path 'C:\Users\mlkstq\Desktop\FFNS\ZipReport.csv' -NoTypeInformation
In my opinion it just does not look that good to have the count of the zip files per subfolder repeated for each line of the subfolder

Powershell - Change metadata from the properties of the file based on the name of the file

Ok, so here is my question:
I have a file with a name like this: IMG_20191215_201811.jpg
What i want is to take the part of 201811 which represents hour, and put it in the created date of the file metadata. Maybe the whole thing, with the date with all 20191215_201811.
What is the best approach to do this?
## Q:\Test\2019\05\19\SO_56211626.ps1
$Directory = "C:\TestFolder"
foreach ($file in (Get-ChildItem -Path $Directory -Filter *.pdf)){
if($File.BaseName -match '_(\d{4}-\d{2}-\d{2})(_\d)?$'){
$date_from_file= (Get-Date $Matches[1])
$file.CreationTime = $date_from_file
$file.LastAccessTime = $date_from_file
$file.LastWriteTime = $date_from_file
$file | Select-Object Name,CreationTime,LastAccessTime,LastWriteTime
}
}
Or maybe this?
$Directory = $env:TEMP
$DateFormat = "yyyy-MM-dd"
# create some test files
$TestFileList = #(
'FileA_2017-10-15.pdf'
'FileB_2016-04-08.pdf'
'FileC_2018-01-30.pdf'
'FileD_2019-09-09_1.pdf'
'FileE_2015-05-05_2.pdf'
)
foreach ($TFL_Item in $TestFileList)
{
$Null = New-Item -Path $Directory -Name $TFL_Item -ItemType File -Force
}
$FileList = Get-ChildItem -LiteralPath $Directory -Filter '*.pdf' -File
foreach ($FL_Item in $FileList) {
# removed split, added regex match to work with ever-growing list of variant file names
$Null = $FL_Item.BaseName -match '_(?<DateString>\d{4}-\d{2}-\d{2})'
$DateString = $Matches.DateString
$date_from_file = [datetime]::ParseExact($DateString, $DateFormat, $Null)
$FL_Item.CreationTime = $date_from_file
$FL_Item.LastWriteTime = $date_from_file
$FL_Item.LastAccessTime = $date_from_file
# show the resulting datetime info
'=' * 20
$CurrentFileInfo = Get-Item -LiteralPath $FL_Item.FullName
$CurrentFileInfo.FullName
$CurrentFileInfo.CreationTime
$CurrentFileInfo.LastWriteTime
$CurrentFileInfo.LastAccessTime
}
P.S.: I'm newbie at this stuff so don't jump on me as i don't even know what is what and this kind of stuff is very heavy on my brain, otherwise .... i wouldn't have asked you guys :)
Thanks,
Bogdan
P.S.2: Kudos to JadonR
The date format in the name of the file matters. In your example code, you have two formats:
yyyymmdd_hhmmss
and
yyyy-mm-dd
Each has to be processed separately, with a different RegEx. Your original RegEx won't work the way I think you expected it to work, so using sub-strings may be more efficient for you. Nevertheless, here is code that works with your first example.
$Directory="C:\TestFolder"
$Things=Get-ChildItem -Path $Directory -Filter "*.jpg"
# I do this because I don't use the "ForEach" statement
$Things=#($Things)
$Regex=[System.Text.RegularExpressions.Regex]::new("IMG_(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})\.",[System.Text.RegularExpressions.RegexOptions]::None)
for ($i=0; $i -lt $Things.Count; $i++) {
$Matches=$RegEx.Matches($Things[$i].Name)
if ($Matches.Count -gt 0) {
$CreationDate=Get-Date -Year $Matches[0].Groups[1].Value -Month $Matches[0].Groups[2].Value -Day $Matches[0].Groups[3].Value -Hour $Matches[0].Groups[4].Value -Minute $Matches[0].Groups[5].Value -Second $Matches[0].Groups[6].Value
$Things[$i].CreationTime=$CreationDate
}
else { Write-Warning "Skipped $($Things[$i].Name)" }
}
# Now that you have changed the creation time, you need to reload the directory
Get-ChildItem -Path $Directory -Filter "*.jpg" | Select-Object -Property Name,CreationTime,LastAccessTime,LastWriteTime
I found I was needing to update file date/time metadata all the time, so I put my solution in a function:
Function SingleItemCopyAttributes
{
[cmdletbinding()]
Param(
[Parameter(Mandatory=$true,Position=0)]
[string]$Source,
[Parameter(Mandatory=$true,Position=1)]
[string]$Dest,
[Parameter(Mandatory=$false,Position=2)]
[system.text.stringbuilder]$ErrorLog=$null
)
$S = Get-Item -LiteralPath $Source
$D = Get-Item -LiteralPath $Dest
Try
{
"CreationTime","LastWriteTime","LastAccessTime" | ForEach { $D.$_ = $S.$_ }
}
Catch
{
If ($ErrorLog -ne $null)
{
$ErrorLog.AppendLine("Error setting attributes for '$Dest' `$S.CreationTime=$($S.CreationTime) `$S.LastWriteTime=$($S.LastWriteTime) `$S.LastAccessTime=$($S.LastAccessTime)") | Out-Null
}
Write-Host "Error setting attributes for '$Dest'"
}
}
EDIT: The function is put into a file SingleItemCopyAttributes.ps1 which I execute to load it into memory. I also can add that function to the top of my script and then call it below in loop where I first copy the item, and then "copy" the attributes with the function above, like this:
$SourceDir = #Define Source Dir here
$DestDir = #Define Dest Dir here
#Don't forget to create $DestDir if you need to with
#mkdir $DestDir
gci $SourceDir | % {
$sf = $_.FullName
$df = $_.FullName.Replace($SourceDir,$DestDir)
copy-item $sf $df
SingleItemCopyAttributes $sf $df
}
If you want to copy all the files/folders recursively
Hope this helps you out.

Improve performance when searching for a string within multiple word files

I have drafted a PowerShell script that searches for a string among a large number of Word files. The script is working fine, but I have around 1 GB of data to search through and it is taking around 15 minutes.
Can anyone suggest any modifications I can do to make it run faster?
Set-StrictMode -Version latest
$path = "c:\Tester1"
$output = "c:\Scripts\ResultMatch1.csv"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "Roaming"
$charactersAround = 30
$results = #()
Function getStringMatch
{
For ($i=1; $i -le 4; $i++) {
$j="D"+$i
$finalpath=$path+"\"+$j
$files = Get-Childitem $finalpath -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
# Loop through all *.doc files in the $path directory
Foreach ($file In $files)
{
$document = $application.documents.open($file.FullName,$false,$true)
$range = $document.content
If($range.Text -match ".{$($charactersAround)}$($findtext).{$($charactersAround)}"){
$properties = #{
File = $file.FullName
Match = $findtext
TextAround = $Matches[0]
}
$results += New-Object -TypeName PsCustomObject -Property $properties
$document.close()
}
}
}
If($results){
$results | Export-Csv $output -NoTypeInformation
}
$application.quit()
}
getStringMatch
import-csv $output
As mentioned in comments, you might want to consider using the OpenXML SDK library (you can also get the newest version of the SDK on GitHub), since it's way less overhead than spinning up an instance of Word.
Below I've turned your current function into a more generic one, using the SDK and with no dependencies on the caller/parent scope:
function Get-WordStringMatch
{
param(
[Parameter(Mandatory,ValueFromPipeline)]
[System.IO.FileInfo[]]$Files,
[string]$FindText,
[int]$CharactersAround
)
begin {
# import the OpenXML library
Add-Type -Path 'C:\Program Files (x86)\Open XML SDK\V2.5\lib\DocumentFormat.OpenXml.dll' |Out-Null
# make a "shorthand" reference to the word document type
$WordDoc = [DocumentFormat.OpenXml.Packaging.WordprocessingDocument] -as [type]
# construct the regex pattern
$Pattern = ".{$CharactersAround}$([regex]::Escape($FindText)).{$CharactersAround}"
}
process {
# loop through all the *.doc(x) files
foreach ($File In $Files)
{
# open document, wrap content stream in streamreader
$Document = $WordDoc::Open($File.FullName, $false)
$DocumentStream = $Document.MainDocumentPart.GetStream()
$DocumentReader = New-Object System.IO.StreamReader $DocumentStream
# read entire document
if($DocumentReader.ReadToEnd() -match $Pattern)
{
# got a match? output our custom object
New-Object psobject -Property #{
File = $File.FullName
Match = $FindText
TextAround = $Matches[0]
}
}
}
}
end{
# Clean up
$DocumentReader.Dispose()
$DocumentStream.Dispose()
$Document.Dispose()
}
}
Now that you have a nice function that supports pipeline input, all you need to do is gather your documents and pipe them to it!
# variables
$path = "c:\Tester1"
$output = "c:\Scripts\ResultMatch1.csv"
$findtext = "Roaming"
$charactersAround = 30
# gather the files
$files = 1..4|ForEach-Object {
$finalpath = Join-Path $path "D$i"
Get-Childitem $finalpath -Recurse | Where-Object { !($_.PsIsContainer) -and #('*.docx','*.doc' -contains $_.Extension)}
}
# run them through our new function
$results = $files |Get-WordStringMatch -FindText $findtext -CharactersAround $charactersAround
# got any results? export it all to CSV
if($results){
$results |Export-Csv -Path $output -NoTypeInformation
}
Since all of our components now support pipelining, you could do it all in one go:
1..4|ForEach-Object {
$finalpath = Join-Path $path "D$i"
Get-Childitem $finalpath -Recurse | Where-Object { !($_.PsIsContainer) -and #('*.docx','*.doc' -contains $_.Extension)}
} |Get-WordStringMatch -FindText $findtext -CharactersAround $charactersAround |Export-Csv -Path $output -NoTypeInformation

How do I non-recursively gather folder sizes and their names? (Powershell)

Basically what I'm trying to do is gather users folder size from their network folder then export that to a .csv, directory structure looks something like this: network:\Department\user...User's-stuff
The script I have right now gets the department file name and the user's folder size, but not the user's name (folder name in the department). As for the TimeStamp, I'm not sure it's working correctly. It's meant to make a timestamp when it starts on the users in the next department so basically, all users in the same department will have the same timestamp.
This is what I have so far:
$root = "network"
$container= #()
$place = "C:\temp\"
$file = "DirectoryReport.csv"
Function Get-FolderSize
{
BEGIN{$fso = New-Object -comobject Scripting.FileSystemObject}
PROCESS
{
$prevDept = (Split-Path $path -leaf)
$path = $input.fullname
$folder = $fso.GetFolder($path)
$Volume = $prevDept + "-users"
$user = $folder.name #can't figure this part out...
$size = $folder."size(MB)"
if ( (Split-Path $path -leaf) -ne $prevDept)
{
$time = Get-Date -format M/d/yyy" "HH:mm #Probably wrong too..
}
return $current = [PSCustomObject]#{'Path' = $path; 'Users' = $user; 'Size(MB)' = ($size /1MB ); 'Volume' = $Volume; 'TimeStamp' = $time;}
}
}
$container += gci $root -Force -Directory -EA 0 | Get-FolderSize
$container
#Creating the .csv path
$placeCSV = $place + $file
#Checks if the file already exists
if ((test-path ($placeCSV)) -eq $true)
{
$file = "DirectoryReport" + [string](Get-Date -format MM.d.yyy.#h.mm.sstt) + ".csv"
rename-item -path $placeCSV -newname $file
$placeCSV = $place + $file
}
#Exports the CSV file to desired folder
$container | epcsv $placeCSV -NoTypeInformation -NoClobber
But in the CSV file the user and the timestamp are wrong. Thanks for any/all help
This really seems to be doing it the hard way. Why you wouldn't just use Get-ChildItem to do this almost makes this script seem a little masochistic to me, so I'm going to use that cmdlet instead of creating a comobject to do it.
I am a little confused as to why you wouldn't want to recurse for size, but ok, we'll go that route. This will get you your folders sizes, in MB.
#Get a listing of department folders
$Depts = GCI $root -force -Directory
#Loop through them
ForEach($Dept in $Depts){
$Users = #()
$Timestamp = Get-Date -Format "M/d/yyy HH:mm"
#Loop through each user for the current department
GCI $Dept -Directory |%{
$Users += [PSCustomObject]#{
User=$_.Name
Path=$_.FullName
"Size(MB)"=(GCI $_|Measure-Object -Sum Length|Select Sum)/1MB
Volume="$($Dept.Name)-Users"
TimeStamp=$Timestamp
}
}
}
#Rename output file if it exists
If(Test-Path "C:\Temp\DirectoryReport.csv"){
Rename-Item "C:\Temp\DirectoryReport.csv" "DirectoryReport.$(Get-Date -format MM.d.yyy.#h.mm.sstt).csv"
}
#Output file
$Users | Export-Csv "C:\Temp\DirectoryReport.csv" -NoTypeInformation
If you want to get the total size for all files within each user's folder, including files within subfolders, change the "Size(MB)"=(GCI $_|Measure-Object -Sum Length|Select Sum)/1MB to be recursive by replacing it with "Size(MB)"=(GCI $_ -recurse|Measure-Object -Sum Length|Select Sum)/1MB and that should have you good to go.