I've built a small report which essentially just does a row counts for Excel files within a share. However, there is now a requirement for the report to display the directory count in a specific order.
I cannot fathom how I'd go about that.
#Searching location
$searchinfolder = '\\Report\testing\'
#Creation of Array.
$data = #()
#Get Child items where not folder object and directory not "Postions"
$Files = Get-ChildItem -Path $searchinfolder -Recurse | Where { ! $_.Directory.Name -ne "Positions" }
Foreach ($File in $Files) {
#Main section. Get csv files, does a row count after removing top 2 and last 3 lines.
$fileStats = Get-Content $File.FullName | Select-Object -Skip 2 | Select-Object -SkipLast 3 | Measure-Object -line
$linesInFile = $fileStats.Lines - 1
#Added a counter because arrays start at 0.. need to start at 1.
$linesInFile++
#Only gets files with data in them.
if ($linesInFile -gt 0) {
$data += #(
[pscustomobject]#{
Filename = $File.fullname;
Rowcount = $linesInFile;
Directory = $File.Directory.Name
})
}
}
#Group by directory and get total sum of each file.
$data = $data | Group-Object Directory | ForEach-Object {
[PSCustomObject]#{
Directory = $_.Group.Directory | Get-Unique
Rowcount = ($_.Group.Rowcount | Measure-Object -sum).Sum
}
}
So for example, let's say the folder structure we're scraping is Cat, Dog, Goat, Programmer, Lama, Mouse.
Let's say all the folders but 1 contain files. How would I go about having the $data array arranged in a specific order of choosing? Furthermore, how would you go about setting the order and just skipping to the next assigned directory if the current directory is empty?
See below my attempt at pseudo-code trying to explain this.
Foreach ($item in $data){
if ($item.directory -eq "cat") { $item = $array[0] }
if ($item.directory -eq "dog") { $item = $array[1] }
if ($item.directory -eq "goat") { $item = $array[2] }
if ($item.directory -eq "Programmer") { $item = $array[3] }
if ($item.directory -eq "Lama") { $item = $array[4] }
if ($item.directory -eq "Mouse") { $item = $array[5] }
}
Related
I am tinkering with Workflows to process concurrently many files. I have written a (shitty) piece of code prior to the actual implementation.
The problem is the loop..is looped!. The idea of the script is to get a list of files onto an Array, (lets call it "Original") create new arrays (lets call them "LoopingArray") to process the files in batches of "x" files and remove the processed items form the array to create a new array to process..etc, until the original Array is empty.
For each item a txt file is created, so when the "Original" array is empty, the Do..While should stop. But it doesn't. it keeps creating files over and over. What I am doing wrong?:
Workflow Test-Workflow {
$SourceFolder = "c:\test\whatever"
$Files = [System.IO.Directory]::EnumerateFiles($SourceFolder, '*.*', 'AllDirectories')
$ArrayCount = $Files| Measure-Object | Select-Object Count
Do {
$NewArray = $Files | Select-Object -First 20
$Files = $Files | Where-Object { $NewArray -notcontains $_ }
$ArrayCount = $Files| Measure-Object | Select-Object Count
ForEach -Parallel ($it in $NewArray)
{
$Name = Get-Random
$Filepath = "C:\temp\test\"+"$Name"+".txt"
$it | Out-File -FilePath $Filepath
}
} Until ($ArrayCount.Count -eq "0")
}
Thanks!
I'm fairly new to PowerShell and I've not been able to find a definitive answer for my problem. I have a bunch of excel files in different folders which are duplicates but have varying file names due to them being updated.
e.g.
015 Approved warranty - Turkey - Case-2019 08-1437015 (issue 3),
015 Approved warranty - Turkey - Case-2019 08-1437015 (final issue)
015 Approved warranty - Turkey - Case-2019 08-1437015
015 Approved warranty - Turkey - Case-2019 08-1437015 amended
I've tried different things but now I know the easiest way to filter the files but don't know the syntax. The anchor point will be the case number just after the date. I want to compare the case numbers against each other and only keep the newest ones (by date modified) and delete the rest. Any guidance is appreciated.
#take files from folder
$dupesource = 'C:\Users\W_Brooker\Documents\Destination\2019\08'
#filter files by case number (7 digit number after date)
$files = Get-ChildItem $dupesource -Filter "08-aaaaaaa"
#If case number is the same keep newest file delete rest
foreach ($file in $files){
$file | Delete-Item - sort -property Datemodified |select -Last 1
}
A PowerShell-idiomatic solution is to:
combine multiple cmdlets in a single pipeline,
in which Group-Object provides the core functionality of grouping duplicate files by shared case number in the file name:
# Define the regex that matches a case number:
# A 7-digit number embedded in filenames that duplicates share.
$regex = '\b\d{7}\b'
# Enumerate all files and select only those whose name contains a case number.
Get-ChildItem -File $dupesource | Where-Object { $_.BaseName -match $regex } |
# Group the resulting files by shared embedded case number.
Group-Object -Property { [regex]::Match($_.BaseName, $regex).Value } |
# Process each group:
ForEach-Object {
# In each group, sort files by most recently updated first.
$_.Group | Sort-Object -Descending LastWriteTimeUtc |
# Skip the most recent file and delete the older ones.
Select-Object -Skip 1 | Remove-Item -WhatIf
}
The -WhatIf common parameter previews the operation. Remove it once you're sure it will do what you want.
This should do the trick:
$files = Get-ChildItem 'C:\Users\W_Brooker\Documents\Destination\2019\08' -Recurse
# create datatable to store file Information in it
$dt = New-Object system.Data.DataTable
[void]$dt.Columns.Add('FileName',[string]::Empty.GetType() )
[void]$dt.Columns.Add('CaseNumber',[string]::Empty.GetType() )
[void]$dt.Columns.Add('FileTimeStamp',[DateTime]::MinValue.GetType() )
[void]$dt.Columns.Add('DeleteFlag',[byte]::MinValue.GetType() )
# Step 1: Make inventory
foreach( $file in $files ) {
if( !$file.PSIsContainer -and $file.Extension -like '.xls*' -and $file.Name -match '^.*\-\d+ *[\(\.].*$' ) {
$row = $dt.NewRow()
$row.FileName = $file.FullName
$row.CaseNumber = $file.Name -replace '^.*\-(\d+) *[\(\.].*$', '$1'
$row.FileTimeStamp = $file.LastWriteTime
$row.DeleteFlag = 0
[void]$dt.Rows.Add( $row )
}
}
# Step 2: Mark files to delete
$rows = $dt.Select('', 'CaseNumber, FileTimeStamp DESC')
$caseNumber = ''
foreach( $row in $rows ) {
if( $row.CaseNumber -ne $caseNumber ) {
$caseNumber = $row.CaseNumber
Continue
}
$row.DeleteFlag = 1
[void]$dt.AcceptChanges()
}
# Step 3: Delete files
$rows = $dt.Select('DeleteFlag = 1', 'FileTimeStamp DESC')
foreach( $row in $rows ) {
$fileName = $row.FileName
Remove-Item -Path $fileName -Force | Out-Null
}
Here's an alternative that leverages the PowerShell Group-Object cmdlet.
It uses a regex to matche files on the case number, ignoring those that don't have a case number. See the screen shot at the bottom that shows test data (a collection of test xlsx files)
cls
#Assume that each file has an xlsx extension.
#Assume that a case number always looks like this: "Case-YYYY~XX-Z" where YYYY is 4 digits, ~ is a single space, XX is two digits, and Z is one-to-many-digits
#make a list of xlsx files (recursive)
$files = Get-ChildItem -LiteralPath .\ExcelFiles -Recurse -Include *.xlsx
#$file is a System.IO.FileInfo object. Parse out the Case number and add it to the $file object as CaseNumber property
foreach ($file in $files)
{
$Matches = $null
$file.Name -match "(^.*)(Case-\d{4}\s{1}\d{2}-\d{1,})(.*\.xlsx$)" | out-null
if ($Matches.Count -eq 4)
{
$caseNumber = $Matches[2]
$file | Add-Member -NotePropertyName CaseNumber -NotePropertyValue $caseNumber
}
Else
{
#child folders will end up in this group too
$file | Add-Member -NotePropertyName CaseNumber -NotePropertyValue "NoCaseNumber"
}
}
#group the files by CaseNumber
$files | Group-Object -Property CaseNumber -OutVariable fileGroups | out-null
foreach ($fileGroup in $fileGroups)
{
#skip folders and files that don't have a valid case #
if ($fileGroup.Name -eq "NoCaseNumber")
{
continue
}
#for each group: sort files descending by LastWriteTime. Newest file will be first, so skip 1st file and remove the rest
$fileGroup.Group | sort -Descending -Property LastWriteTime | select -skip 1 | foreach {Remove-Item -LiteralPath $_.FullName -Force}
}
Test Data
I'm currently writing a script that checks each folder in a directory for the last time a file was written to each folder. I'm having trouble figuring out how to obtain the last time a file was written to the folder, as opposed to just retrieving the folder's creation date.
I've tried using Poweshell's recursive method, but couldn't figure out how to properly set it up. Right now, the script successfully prints the name of each folder to the Excel spreadsheet, and also print the last write time of each folder, which is the incorrect information.
$row = 2
$column = 1
Get-ChildItem "C:\Users\Sylveon\Desktop\Test"| ForEach-Object {
#FolderName
$sheet.Cells.Item($row,$column) = $_.Name
$column++
#LastBackup
$sheet.Cells.Item($row,$column) = $_.LastWriteTime
$column++
#Increment to next Row and reset Column
$row++
$column = 1
}
The current state of the script prints each folder name to the report, but gives the folders creation date rather than the last time a file was written to that folder.
The following should work to get the most recent edit date of any file in the current directory.
Get-ChildItem | Sort-Object -Property LastWriteTime -Descending | Select-Object -first 1 -ExpandProperty "LastWriteTime"
Get-ChildItem gets items in your directory
Sort-Object -Property LastWriteTime -Descending sorts by write-time, latest first
Select-Object -first 1 -ExpandProperty "LastWriteTime" gets the first one in the list, then gets its write-time
I made this to get the data you're trying to get. The last line gives us an empty string if the directory is empty, which is probably what's safest for Excel, but you could also default to something other than an empty string, like the directory's creation date:
$ChildDirs = Get-ChildItem | Where-Object { $_ -is [System.IO.DirectoryInfo] }
$EditNames = $ChildDirs | ForEach-Object Name
$EditTimes = $EditNames | ForEach-Object { #( (Get-ChildItem $_ | Sort-Object -Property LastWriteTime -Descending | Select-Object -first 1 LastWriteTime), '' -ne $null)[0] }
for($i=0; $i -lt $ChildDirs.Length; $i++) {
Write-Output $EditNames[$i]
Write-Output $EditTimes[$i]
}
To implement this for what you're doing, if I understand your question correctly, try the following:
$ChildDirs = Get-ChildItem | Where-Object { $_ -is [System.IO.DirectoryInfo] }
$EditNames = $ChildDirs | ForEach-Object Name
$EditTimes = $EditNames | ForEach-Object { #( (Get-ChildItem $_ | Sort-Object -Property LastWriteTime -Descending | Select-Object -first 1 LastWriteTime), '' -ne $null)[0] }
for($i=0; $i -lt $ChildDirs.Length; $i++) {
#FolderName
$sheet.Cells.Item($row, $column) = $EditNames[$i]
$column++
#LastBackup
$sheet.Cells.Item($row, $column) = $EditTimes[$i]
$row++
$column = 1
}
If you're only looking at the first level of files in each folder, you can do it using a nested loop:
$row = 2
$column = 1
$folders = Get-ChildItem $directorypath
ForEach ($folder in $folders) {
# start off with LastEdited set to the last write time of the folder itself
$LastEdited = $folder.LastWriteTime
$folderPath = $directoryPath + '\' + $folder.Name
# this 'dynamically' sets each folder's path
$files = Get-Childitem $folderPath
ForEach ($file in $files) {
if ((Get-Date $file.LastWriteTime) -gt (Get-Date $LastEdited)) {
$LastEdited = $file.LastWriteTime
}
}
$sheet.Cells.Item($row,$column) = $folder.Name
$column++
$sheet.Cells.Item($row,$column) = $LastEdited
$row++
$column = 1
}
I am trying to compare multiple files against a single document. I have managed to make that part work however where my issue is, is that i want to be able to check if the files exist before a comparison is run.
i.e. check if file A exists, if so compare against master csv file, if not continue on and check if file b exists, if so compare against master csv and so on.
my script so far goes:
$files = get-content -path "H:\Compare\File Location\servername Files.txt"
$prod = "H:\compare\Results\master_SystemInfo.csv"
foreach ($file in $files) {
If((Test-Path -path $file))
{
Write-Host "File exists, comparing against production"
$content1 = Get-Content "H:\Compare\Results\$file"
$content2 = Get-Content $prod
$comparedLines = Compare-Object $content1 $content2 -IncludeEqual |
Sort-Object { $_.InputObject.ReadCount }
$lineNumber = 0
$comparedLines | foreach {
$pattern = ".*"
if($_.SideIndicator -eq "==" -or $_.SideIndicator -eq "=>")
{
$lineNumber = $_.InputObject.ReadCount
}
if($_.InputObject -match $pattern)
{
if($_.SideIndicator -ne "==")
{
if($_.SideIndicator -eq "=>")
{
$lineOperation = "prod"
}
elseif($_.SideIndicator -eq "<=")
{
$lineOperation = "test"
}
[PSCustomObject] #{
Line = $lineNumber
File = $lineOperation
Text = $_.InputObject
}
}
}
} | Export-Csv "h:\compare\Comparison Reports\Prod.vs.$file" - NoTypeInformation
}
Else
{ "File does not exist, aborting" ; return}
}
The comparison is working just need to add the check for file before running comparison as it is still spitting out results for files that don't exist.
Thank you very much,
I have found the answer by altering the code, this time im just creating a txt file from the files in the folder first that way i don't need to test-path. This now generates a file list from the folder, then compares each file against the master file and outputs multiple files, one for each comparison saving it as the original filename i.e. "Prod.vs._SystemInfor.csv"
FYI - In the first line the abc123* is a variable i put in to look for specific server names within the folder and generate a file list based on those only. We have a number of servers all with similar naming conventions just the last 4 digits are different depending on where they are located.
Thanks
Working Powershell script:
Get-ChildItem -file abc123* H:\Compare\Results -Name | Out-File "H:\Compare\Results\Office Files.txt"
$officefiles = get-content -path "H:\Compare\results\Office Files.txt"
$officeprod = "H:\compare\Results\master_SystemInfo.csv"
foreach ($officefile in $officefiles) {
$content1 = Get-Content "H:\Compare\Results\$officefile"
$content2 = Get-Content $officeprod
$comparedLines = Compare-Object $content1 $content2 -IncludeEqual |
Sort-Object { $_.InputObject.ReadCount }
$lineNumber = 0
$comparedLines | foreach {
$pattern = ".*"
if($_.SideIndicator -eq "==" -or $_.SideIndicator -eq "=>")
{
$lineNumber = $_.InputObject.ReadCount
}
if($_.InputObject -match $pattern)
{
if($_.SideIndicator -ne "==")
{
if($_.SideIndicator -eq "=>")
{
$lineOperation = "prod"
}
elseif($_.SideIndicator -eq "<=")
{
$lineOperation = "test"
}
[PSCustomObject] #{
Line = $lineNumber
File = $lineOperation
Text = $_.InputObject
}
}
}
} | Export-Csv "h:\compare\Comparison Reports\Prod.vs.$officefile" -NoTypeInformation
}
So far I have a hash table with 2 values in it. Right now the code below, exports all the unique lines and gives me a count of how many times the line was referenced in 100's of xml files. This is one part.
I now need to find out which subfolder had the xml file in it that has that unique line of referenced in the hash table. Is this possible?
$ht = #{}
Get-ChildItem -recurse -Filter *.xml | Get-Content | %{$ht[$_] = $ht[$_]+1}
$ht
# To export to CSV:
$ht.GetEnumerator() | select key, value | Export-Csv D:\output.csv
To get file path to your output, you need to assign it to a variable in the first pipe.
Is this something similar to what you need?
$ht = #{}
Get-ChildItem -recurse -Filter *.xml | %{$path = $_.FullName; Get-Content $path} | % { $ht[$_] = $ht[$_] + $path + ";"}
The code above will return a hash-table in "config line" = "count" format.
EDIT:
If you need to return three elements (unique line, count and array of paths where it was found) it gets more complicated. Here is a code that will return an array of PSObjects. Each contains info for one unique line in XML files.
$ht = #()
$files = Get-ChildItem -recurse -Filter *.xml
foreach ($file in $files) {
$path = $file.FullName
$lines = Get-Content $path
foreach ($line in $lines) {
if ($match = $ht | where {$_.line -EQ $line}) {
$match.count = $match.count + 1
$match.Paths += $path
} else {
$ht += new-object PSObject -Property #{
Count = 1
Paths = #(,$path)
Line = $line }
}
}
}
$ht
I'm sure it can be shortened and optimized, but hopefully it is enough to get you started.