Why is other powershell script run twice? - powershell

I have a script that checks whether disk has a certain amount of free space. If not, a pop-up appears asking for a yes or no. If yes, then an alarm is set to 1 and then another script that deletes files from folder runs. My issue is that it seems to delete twice the number specified in the script.
Main script:
$limit_low = 0.1 # låg gräns 10%
$DiskD = Get-PSDrive D | Select-Object Used,Free | Write-Output
$DiskD_use = [math]::Round(($DiskD.Free / ($DiskD.Used + $DiskD.Free)),2)
if( $DiskD_use -le $limit_low ) {
Write-Host "RDS-server har för lite utrymme på disk D $diskD_use < $limit_low" -ForegroundColor Red -BackgroundColor Yellow
$ButtonType = 4
$Timeout = 60
$Confirmation = New-Object -ComObject wscript.shell
$ConfirmationAnswer = $Confirmation.popup("Clear disk space?",$Timeout,"No space",$ButtonType)
If( $ConfirmationAnswer -eq 6 ) {
Write-Host "Kör script Diskspace.ps1 under P:\backupscripts"
& c:\dynamics\app\JDSend.exe "/UDP /LOG:c:\dynamics\app\Fixskick.log /TAG:Lunsc2:K_PROCESS_LARM_DISKUTRYMME "1""
& P:\BackupScripts\Delete_archives_test.ps1 # here i call the other script
} else {
Write-Host "Gör ingenting"
& c:\dynamics\app\JDSend.exe "/UDP /LOG:c:\dynamics\app\Fixskick.log /TAG:Lunsc2:K_PROCESS_LARM_DISKUTRYMME "0""
}
}
Other script:
# List all txt-files in directory, sort them och select the first 10, then delete
Get-ChildItem -Path c:\temp -File Archive*.txt | Sort-Object | Select-Object -First 10 | Remove-Item -Force
Cheers
EDIT
So it would be enough to enclose the first statement like this:
(Get-ChildItem -Path c:\temp -File Archive*.txt) | Sort-Object | Select-Object -First 10 | Remove-Item -Force
?? Funny thing is i tried to reproduce this today at home with no effect. Works as intended from here, even without parens.
Further I am painfully ignorant about how to use foreach statement, as it doesn't pipe the bastard :)
foreach ($file in $filepath) {$file} | Sort-Object | Select-Object -First 10 | Remove-Item -Force
Tried to put the sort and select-part in the {} too, but nothing good came of it. As i'm stuck in the pipe and don't understand the foreach logic.

your problem appears to be caused by how your pipeline works. [grin]
think about what it does ...
read ONE fileinfo item
send it to the pipeline
change/add a file
continue the pipeline
that 3rd step will cause the file list to change ... and may result in a file being read again OR some other change in the list of files to work on.
there are two solutions that come to mind ...
wrap the Get-ChildItem call in parens
that will force one read of the list before sending anything to the pipeline ... and that will ignore any changes caused by later pipeline stages.
use a foreach loop
that will read the whole list and then iterate thru the list one item at a time.
the 2nd solution also has the benefit of being easier to debug since the value of your current item only changes when explicitly modified. the current pipeline item changes at every pipeline stage ... and that is easy to forget. [grin]

Related

Powershell script been running for days when doing comparison

I got a powershell query, it works fine for smaller amount of data but i am trying to run my CSV against a folder which has multiple folders and files within. Folder size is nearly 800GB and 180 folders within.
I want to see if the file exists in the folder, I can manually search the files within Windows and does not take to long to return a result but my CSV has 3000 rows and i do not wish to do this for 3000 rows. My script works fine for a smaller amount of data.
The script has been running for 6 days and it has not generated a file with data as of yet. it is 0KB and I am running it via task scheduler.
Script is below.
$myFolder = Get-ChildItem 'C:\Test\TestData' -Recurse -ErrorAction
SilentlyContinue -Force
$myCSV = Import-Csv -Path 'C:\Test\differences.csv' | % {$_.'name' -replace "\\", ""}
$compare = Compare-Object -ReferenceObject $myCSV -DifferenceObject $myFolder
Write-Output "`n_____MISSING FILES_____`n"
$compare
Write-Output "`n_____MISSING FILES DETAILS____`n"
foreach($y in $compare){
if($y.SideIndicator -eq "<="){
write-output "$($y.InputObject) Is present in the CSV but not in Missing folder."
}
}
I then created another script which runs the above script and contains an out file command and runs with Task scheduler.
C:\test\test.ps1 | Out-File 'C:\test\Results.csv'
is there a better way of doing this?
Thanks
is there a better way of doing this?
Yes!
Add each file name on disk to a HashSet[string]
the HashSet type is SUPER FAST at determining whether it contains a
specific value or not, much faster than Compare-Object
Loop over your CSV records, check if each file name exists in the set from step 1
# 1. Build our file name index using a HashSet
$fileNames = [System.Collections.Generic.HashSet[string]]::new()
Get-ChildItem 'C:\Test\TestData' -Recurse -ErrorAction
SilentlyContinue -Force |ForEach-Object {
[void]$fileNames.Add($_.Name)
}
# 2. Check each CSV record against the file name index
Import-Csv -Path 'C:\Test\differences.csv' |ForEach-Object {
$referenceName = $_.name -replace '\\'
if(-not $fileNames.Contains($referenceName)){
"${referenceName} is present in CSV but not on disk"
}
}
Another option is to use the hash set from step 1 in a Where-Object filter:
$csvRecordsMissingFromDisk = Import-Csv -Path 'C:\Test\differences.csv' |Where-Object { -not $fileNames.Contains($_) }

trying to get unique list of extension in a directory with a lot of files is going very slowly

I am trying to get the list of unique extension, and example file of each, in a dataset that is about 9TB and has several hundred thousand files. I try to use the get-child item and it works when I filter to folders that don't have a lot of files but when I filter it to one with a lot of files it seems like it will never start. below are two examples that I have been trying.
$Extensions = New-Object System.Collections.ArrayList
$filesReviewed = 0
Get-ChildItem \\server\folder -Exclude 'excludeFolder'| Get-ChildItem | Where-Object {$_.Name.Equals('files')} | Get-ChildItem -OutBuffer 1000 |
foreach{
Write-Progress -Activity "Files Reviewed: " -Status "$filesReviewed"
$filesReviewed++
if( $Extensions.contains($_.Extension) -eq $False) {
$Extensions.add($_.Extension)
Write-Host $_.Extension
Write-Host $Path = $_.FullName
}
}
I started to try to use dir thinking it might be faster but it has the same problem
set-location \\server\folder
dir | dir | Where-Object {$_.Name.Equals('files')} | dir -OutBuffer 10
Get-ChildItem retrieves a lot of info about a file that you don't need in this case and is slowing you down. You could try using [System.IO.Directory]::GetFiles to speed things up
$extensions=#{}
[System.IO.Directory]::GetFiles("\\server\folder", "*.*", [System.IO.SearchOption]::AllDirectories) | %
{
$extensions[[System.IO.DirectoryInfo]::new($_).Extension]++
}
$extensions | ft -a
you may try the following:
(Get-ChildItem -Path C:\windows -File -Recurse).Extension | Select-Object -Unique
Of course, replace the path with the one you would like to use.
More details about get-childitem could be found in: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.management/get-childitem?view=powershell-6.
Hope it helps!
There are two keys to speeding up your code:
Avoid the use of the pipeline and therefore avoid using cmdlets.
If you cannot avoid the pipeline, avoid the use of custom script blocks ({ ... }), because executing one for every input object is time-consuming.
Generally, avoid the use of Write-Progress, which slows things down noticeably.
Avoiding cmdlets requires direct use of .NET framework types instead.
Lieven Keersmaekers' helpful answer is a promising start, although combining the pipeline (%, i.e. the ForEach-Object cmdlet), slows things down, as does the construction of a [System.IO.DirectoryInfo] instance in each iteration, although to a lesser degree.
Note: For brevity and simplicity, the following solution focuses on processing a given directory's entire subtree (the equivalent of Get-ChildItem -Recurse -File).
A performance-optimized solution:
Note the following aspects:
[System.IO.Directory]::EnumerateFiles() rather than Get-ChildItem is used to enumerate the files.
A foreach loop rather than the pipeline with the ForEach-Object cmdlet (%) is used.
Inside the loop, construction of unnecessary objects is avoided, by calling the static [System.IO.Path]::GetExtension() method to extract the filename extension.
$seenExtensions=#{}
foreach ($file in [IO.Directory]::EnumerateFiles($PWD.ProviderPath, '*', 'AllDirectories')) {
if (-not $seenExtensions.ContainsKey(($ext = [IO.Path]::GetExtension($file)))) {
$seenExtensions.Add($ext, $true)
[pscustomobject] #{
Extension = $ext
Example = $file
}
}
}
The above outputs an array of custom objects each representing a unique extension (property .Extension) and the path of the first file with that extension encountered (.Example).
Sample output (note that the output won't be sorted by extension, but you can simply pipe to ... | Sort-Object Extension):
Extension Example
--------- -------
.json C:\temp\foo.json
.txt C:\temp\sub\bar.txt
...
If performance weren't a concern, PowerShell's cmdlets would allow for a much more elegant solution:
Get-ChildItem -File -Recurse |
Group-Object Extension |
Select #{ n='Extension'; e='Name' }, #{ n='Example'; e = { $_.Group[0].Name } }
Note that Group-Object implicitly sorts the output by the grouping property, so the output will be sorted alphabetically by filename extension.

PowerShell script to execute if threshold exceeded

First off, sorry for the long post - I'm trying to be detailed!
I'm looking to automate a work around for an issue I discovered. I have a worker that periodically bombs once the "working" directory has more than 100,000 files in it. Preventatively I can stop the process and rename the working directory to "HOLD" and create new working dir to keep it going. Then I move files from the HOLD folder(s) back into the working dir a little bit at a time until its caught up.
What I would like to do is automate the entire process via Task Scheduler with 2 PowerShell scripts.
----SCRIPT 1----
Here's the condition:
If file count in working dir is greater than 60,000
I find that( [System.IO.Directory]::EnumerateFiles($Working)is faster than Get-ChildItem.
The actions:
Stop-Service for Service1, Service2, Service3
Rename-Item -Path "C:\Prod\Working\" -NewName "Hold" or "Hold1","2","3",etc.. if the folder already exists --I'm not particular about the numeration as long as it is consistent so if it's easier to let the system name it HOLD, HOLD(1), HOLD(2), etc.. or append the date after HOLD then that's fine.
New-Item C:\Prod\Working -type directory
Start-Service Service1, Service2, Service3
---SCRIPT 2----
Condition:
If file count in working dir is less than 50,000
Actions:
Move 5,000 files from HOLD* folder(s) --Move 5k files from the HOLD folder until empty, then skip the empty folder and start moving files from HOLD1. This process should be dynamic and repeat to the next folders.
Before it comes up, I'm well aware it would be easier to simply move the files from the working folder to a Hold folder, but the size of the files can be very large and moving them always seems to take much longer.
I greatly appreciate any input and I'm eager to see some solid answers!
EDIT
Here's what I'm running for Script 2 -courtesy of Bacon
#Setup
$restoreThreshold = 30000; # Ensure there's enough room so that restoring $restoreBatchSize
$restoreBatchSize = 500; # files won't push $Working's file count above $restoreThreshold
$Working = "E:\UnprocessedTEST\"
$HoldBaseDirectory = "E:\"
while (#(Get-ChildItem -File -Path $Working).Length -lt $restoreThreshold - $restoreBatchSize)
{
$holdDirectory = Get-ChildItem -Path $HoldBaseDirectory -Directory -Filter '*Hold*' |
Select-Object -Last 1;
if ($holdDirectory -eq $null)
{
# There are no Hold directories to process; don't keep looping
break;
}
# Restore the first $restoreBatchSize files from $holdDirectory and store the count of files restored
$restoredCount = Get-ChildItem $holdDirectory -File `
| Select-Object -First $restoreBatchSize | Move-Item -Destination $Working -PassThru |
Measure-Object | Select-Object -ExpandProperty 'Count';
# If less than $restoreBatchSize files were restored then $holdDirectory is now empty; delete it
if ($restoredCount -lt $restoreBatchSize)
{
Remove-Item -Path $holdDirectory;
}
}
The first script could look like this:
$rotateThreshold = 60000;
$isThresholdExceeded = #(
Get-ChildItem -File -Path $Working `
| Select-Object -First ($rotateThreshold + 1) `
).Length -gt $rotateThreshold;
#Alternative: $isThresholdExceeded = #(Get-ChildItem -File -Path $Working).Length -gt $rotateThreshold;
if ($isThresholdExceeded)
{
Stop-Service -Name 'Service1', 'Service2', 'Service3';
try
{
$newName = 'Hold_{0:yyyy-MM-ddTHH-mm-ss}' -f (Get-Date);
Rename-Item -Path $Working -NewName $newName;
}
finally
{
New-Item -ItemType Directory -Path $Working -ErrorAction SilentlyContinue;
Start-Service -Name 'Service1', 'Service2', 'Service3';
}
}
The reason for assigning $isThresholdExceeded the way I am is because we don't care what the exact count of files is, just if it's above or below that threshold. As soon as we know that threshold has been exceeded we don't need any further results from Get-ChildItem (or the same for [System.IO.Directory]::EnumerateFiles($Working)), so as an opimization Select-Object will terminate the pipeline on the element after the threshold is reached. In a directory with 100,000 files on an SSD I found this to be almost 40% faster than allowing Get-ChildItem to enumerate all files (4.12 vs. 6.72 seconds). Other implementations using foreach or ForEach-Object proved to be slower than #(Get-ChildItem -File -Path $Working).Length.
As for generating the new name for the 'Hold' directories, you could save and update an identifier somewhere, or just generate new names with an incrementing suffix until you find one that's not in use. I think it's easier to just base the name on the current time. As long as the script doesn't run more than once a second you'll know the name is unique, they'll sort just as well as numerals, plus it gives you a little diagnostic information (the time that directory was rotated out) for free.
Here's some basic code for the second script:
$restoreThreshold = 50000;
$restoreBatchSize = 5000;
# Ensure there's enough room so that restoring $restoreBatchSize
# files won't push $Working's file count above $restoreThreshold
while (#(Get-ChildItem -File -Path $Working).Length -lt $restoreThreshold - $restoreBatchSize)
{
$holdDirectory = Get-ChildItem -Path $HoldBaseDirectory -Directory -Filter 'Hold_*' `
| Select-Object -First 1;
if ($holdDirectory -eq $null)
{
# There are no Hold directories to process; don't keep looping
break;
}
# Restore the first $restoreBatchSize files from $holdDirectory and store the count of files restored
$restoredCount = Get-ChildItem -File -Path $holdDirectory.FullName `
| Select-Object -First $restoreBatchSize `
| Move-Item -Destination $Working -PassThru `
| Measure-Object `
| Select-Object -ExpandProperty 'Count';
# If less than $restoreBatchSize files were restored then $holdDirectory is now empty; delete it
if ($restoredCount -lt $restoreBatchSize)
{
Remove-Item -Path $holdDirectory.FullName;
}
}
As noted in the comment before the while loop, the condition is ensuring that the count of files in $Working is at least $restoreBatchSize files away from $restoreThreshold so that if $restoreBatchSize files are restored it won't exceed the threshold in the process. If you don't care about that, or the chosen threshold already accounts for that, you change the condition to compare against $restoreThreshold instead of $restoreThreshold - $restoreBatchSize. Alternatively, leave the condition the same and change $restoreThreshold to 55000.
The way I've written the loop, on each iteration at most $restoreBatchSize files will be restored from the first 'Hold_*' directory it finds, then the file count in $Working is reevaluated. Considering that, as I understand it, there are files being added and removed from $Working external to this script and simultaneous to its execution, this might be the safest approach and also the simplest approach. You could certainly enhance this by calculating how far below $restoreThreshold you are and performing the necessary number of batch restores, from one or more 'Hold_*' directories, all in one iteration of the loop.

How to use Powershell to list duplicate files in a folder structure that exist in one of the folders

I have a source tree, say c:\s, with many sub-folders. One of the sub-folders is called "c:\s\Includes" which can contain one or more .cs files recursively.
I want to make sure that none of the .cs files in the c:\s\Includes... path exist in any other folder under c:\s, recursively.
I wrote the following PowerShell script which works, but I'm not sure if there's an easier way to do it. I've had less than 24 hours experience with PowerShell so I have a feeling there's a better way.
I can assume at least PowerShell 3 being used.
I will accept any answer that improves my script, but I'll wait a few days before accepting the answer. When I say "improve", I mean it makes it shorter, more elegant or with better performance.
Any help from anyone would be greatly appreciated.
The current code:
$excludeFolder = "Includes"
$h = #{}
foreach ($i in ls $pwd.path *.cs -r -file | ? DirectoryName -notlike ("*\" + $excludeFolder + "\*")) { $h[$i.Name]=$i.DirectoryName }
ls ($pwd.path + "\" + $excludeFolder) *.cs -r -file | ? { $h.Contains($_.Name) } | Select #{Name="Duplicate";Expression={$h[$_.Name] + " has file with same name as " + $_.Fullname}}
1
I stared at this for a while, determined to write it without studying the existing answers, but I'd already glanced at the first sentence of Matt's answer mentioning Group-Object. After some different approaches, I get basically the same answer, except his is long-form and robust with regex character escaping and setup variables, mine is terse because you asked for shorter answers and because that's more fun.
$inc = '^c:\\s\\includes'
$cs = (gci -R 'c:\s' -File -I *.cs) | group name
$nopes = $cs |?{($_.Group.FullName -notmatch $inc)-and($_.Group.FullName -match $inc)}
$nopes | % {$_.Name; $_.Group.FullName}
Example output:
someFile.cs
c:\s\includes\wherever\someFile.cs
c:\s\lib\factories\alt\someFile.cs
c:\s\contrib\users\aa\testing\someFile.cs
The concept is:
Get all the .cs files in the whole source tree
Split them into groups of {filename: {files which share this filename}}
For each group, keep only those where the set of files contains any file with a path that matches the include folder and contains any file with a path that does not match the includes folder. This step covers
duplicates (if a file only exists once it cannot pass both tests)
duplicates across the {includes/not-includes} divide, instead of being duplicated within one branch
handles triplicates, n-tuplicates, as well.
Edit: I added the ^ to $inc to say it has to match at the start of the string, so the regex engine can fail faster for paths that don't match. Maybe this counts as premature optimization.
2
After that pretty dense attempt, the shape of a cleaner answer is much much easier:
Get all the files, split them into include, not-include arrays.
Nested for-loop testing every file against every other file.
Longer, but enormously quicker to write (it runs slower, though) and I imagine easier to read for someone who doesn't know what it does.
$sourceTree = 'c:\\s'
$allFiles = Get-ChildItem $sourceTree -Include '*.cs' -File -Recurse
$includeFiles = $allFiles | where FullName -imatch "$($sourceTree)\\includes"
$otherFiles = $allFiles | where FullName -inotmatch "$($sourceTree)\\includes"
foreach ($incFile in $includeFiles) {
foreach ($oFile in $otherFiles) {
if ($incFile.Name -ieq $oFile.Name) {
write "$($incFile.Name) clash"
write "* $($incFile.FullName)"
write "* $($oFile.FullName)"
write "`n"
}
}
}
3
Because code-golf is fun. If the hashtables are faster, what about this even less tested one-liner...
$h=#{};gci c:\s -R -file -Filt *.cs|%{$h[$_.Name]+=#($_.FullName)};$h.Values|?{$_.Count-gt1-and$_-like'c:\s\includes*'}
Edit: explanation of this version: It's doing much the same solution approach as version 1, but the grouping operation happens explicitly in the hashtable. The shape of the hashtable becomes:
$h = {
'fileA.cs': #('c:\cs\wherever\fileA.cs', 'c:\cs\includes\fileA.cs'),
'file2.cs': #('c:\cs\somewhere\file2.cs'),
'file3.cs': #('c:\cs\includes\file3.cs', 'c:\cs\x\file3.cs', 'c:\cs\z\file3.cs')
}
It hits the disk once for all the .cs files, iterates the whole list to build the hashtable. I don't think it can do less work than this for that bit.
It uses +=, so it can add files to the existing array for that filename, otherwise it would overwrite each of the hashtable lists and they would be one item long for only the most recently seen file.
It uses #() - because when it hits a filename for the first time, $h[$_.Name] won't return anything, and the script needs put an array into the hashtable at first, not a string. If it was +=$_.FullName then the first file would go into the hashtable as a string and the += next time would do string concatenation and that's no use to me. This forces the first file in the hashtable to start an array by forcing every file to be a one item array. The least-code way to get this result is with +=#(..) but that churn of creating throwaway arrays for every single file is needless work. Maybe changing it to longer code which does less array creation would help?
Changing the section
%{$h[$_.Name]+=#($_.FullName)}
to something like
%{if (!$h.ContainsKey($_.Name)){$h[$_.Name]=#()};$h[$_.Name]+=$_.FullName}
(I'm guessing, I don't have much intuition for what's most likely to be slow PowerShell code, and haven't tested).
After that, using h.Values isn't going over every file for a second time, it's going over every array in the hashtable - one per unique filename. That's got to happen to check the array size and prune the not-duplicates, but the -and operation short circuits - when the Count -gt 1 fails, the so the bit on the right checking the path name doesn't run.
If the array has two or more files in it, the -and $_ -like ... executes and pattern matches to see if at least one of the duplicates is in the includes path. (Bug: if all the duplicates are in c:\cs\includes and none anywhere else, it will still show them).
--
4
This is edited version 3 with the hashtable initialization tweak, and now it keeps track of seen files in $s, and then only considers those it's seen more than once.
$h=#{};$s=#{};gci 'c:\s' -R -file -Filt *.cs|%{if($h.ContainsKey($_.Name)){$s[$_.Name]=1}else{$h[$_.Name]=#()}$h[$_.Name]+=$_.FullName};$s.Keys|%{if ($h[$_]-like 'c:\s\includes*'){$h[$_]}}
Assuming it works, that's what it does, anyway.
--
Edit branch of topic; I keep thinking there ought to be a way to do this with the things in the System.Data namespace. Anyone know if you can connect System.Data.DataTable().ReadXML() to gci | ConvertTo-Xml without reams of boilerplate?
I'd do more or less the same, except I'd build the hashtable from the contents of the includes folder and then run over everything else to check for duplicates:
$root = 'C:\s'
$includes = "$root\includes"
$includeList = #{}
Get-ChildItem -Path $includes -Filter '*.cs' -Recurse -File |
% { $includeList[$_.Name] = $_.DirectoryName }
Get-ChildItem -Path $root -Filter '*.cs' -Recurse -File |
? { $_.FullName -notlike "$includes\*" -and $includeList.Contains($_.Name) } |
% { "Duplicate of '{0}': {1}" -f $includeList[$_.Name], $_.FullName }
I'm not as impressed with this as I would like but I thought that Group-Object might have a place in this question so I present the following:
$base = 'C:\s'
$unique = "$base\includes"
$extension = "*.cs"
Get-ChildItem -Path $base -Filter $extension -Recurse |
Group-Object $_.Name |
Where-Object{($_.Count -gt 1) -and (($_.Group).FullName -match [regex]::Escape($unique))} |
ForEach-Object {
$filename = $_.Name
($_.Group).FullName -notmatch [regex]::Escape($unique) | ForEach-Object{
"'{0}' has file with same name as '{1}'" -f (Split-Path $_),$filename
}
}
Collect all the files with the extension filter $extension. Group the files based on their names. Then of those groups find every group where there are more than one of that particular file and one of the group members is at least in the directory $unique. Take those groups and print out all the files that are not from the unique directory.
From Comment
For what its worth this is what I used for testing to create a bunch of files. (I know the folder 9 is empty)
$base = "E:\Temp\dev\cs"
Remove-Item "$base\*" -Recurse -Force
0..9 | %{[void](New-Item -ItemType directory "$base\$_")}
1..1000 | %{
$number = Get-Random -Minimum 1 -Maximum 100
$folder = Get-Random -Minimum 0 -Maximum 9
[void](New-Item -Path $base\$folder -ItemType File -Name "$number.txt" -Force)
}
After looking at all the others, I thought I would try a different approach.
$includes = "C:\s\includes"
$root = "C:\s"
# First script
Measure-Command {
[string[]]$filter = ls $includes -Filter *.cs -Recurse | % name
ls $root -include $filter -Recurse -Filter *.cs |
Where-object{$_.FullName -notlike "$includes*"}
}
# Second Script
Measure-Command {
$filter2 = ls $includes -Filter *.cs -Recurse
ls $root -Recurse -Filter *.cs |
Where-object{$filter2.name -eq $_.name -and $_.FullName -notlike "$includes*"}
}
In my first script, I get all the include files into a string array. Then i use that string array as a include param on the get-childitem. In the end, I filter out the include folder from the results.
In my second script, I enumerate everything and then filter after the pipe.
Remove the measure-command to see the results. I was using that to check the speed. With my dataset, the first one was 40% faster.
$FilesToFind = Get-ChildItem -Recurse 'c:\s\includes' -File -Include *.cs | Select Name
Get-ChildItem -Recurse C:\S -File -Include *.cs | ? { $_.Name -in $FilesToFind -and $_.Directory -notmatch '^c:\s\includes' } | Select Name, Directory
Create a list of file names to look for.
Find all files that are in the list but not part of the directory the list was generated from
Print their name and directory

How to implement a parallel jobs and queues system in Powershell [duplicate]

I spent days trying to implement a parallel jobs and queues system, but... I tried but I can't make it. Here is the code without implementing nothing, and CSV example from where looks.
I'm sure this post can help other users in their projects.
Each user have his pc, so the CSV file look like:
pc1,user1
pc2,user2
pc800,user800
CODE:
#Source File:
$inputCSV = '~\desktop\report.csv'
$csv = import-csv $inputCSV -Header PCName, User
echo $csv #debug
#Output File:
$report = "~\desktop\output.csv"
#---------------------------------------------------------------
#Define search:
$findSize = 40GB
Write-Host "Lonking for $findSize GB sized Outlook files"
#count issues:
$issues = 0
#---------------------------------------------------------------
foreach($item in $csv){
if (Test-Connection -Quiet -count 1 -computer $($item.PCname)){
$w7path = "\\$($item.PCname)\c$\users\$($item.User)\appdata\Local\microsoft\outlook"
$xpPath = "\\$($item.PCname)\c$\Documents and Settings\$($item.User)\Local Settings\Application Data\Microsoft\Outlook"
if(Test-Path $W7path){
if(Get-ChildItem $w7path -Recurse -force -Include *.ost -ErrorAction "SilentlyContinue" | Where-Object {$_.Length -gt $findSize}){
$newLine = "{0},{1},{2}" -f $($item.PCname),$($item.User),$w7path
$newLine | add-content $report
$issues ++
Write-Host "Issue detected" #debug
}
}
elseif(Test-Path $xpPath){
if(Get-ChildItem $w7path -Recurse -force -Include *.ost -ErrorAction "SilentlyContinue" | Where-Object {$_.Length -gt $findSize}){
$newLine = "{0},{1},{2}" -f $($item.PCname),$($item.User),$xpPath
$newLine | add-content $report
$issues ++
Write-Host "Issue detected" #debug
}
}
else{
write-host "Error! - bad path"
}
}
else{
write-host "Error! - no ping"
}
}
Write-Host "All done! detected $issues issues"
Parallel data processing in PowerShell is not quite simple, especially with
queueing. Try to use some existing tools which have this already done.
You may take look at the module
SplitPipeline. The cmdlet
Split-Pipeline is designed for parallel input data processing and supports
queueing of input (see the parameter Load). For example, for 4 parallel
pipelines with 10 input items each at a time the code will look like this:
$csv | Split-Pipeline -Count 4 -Load 10, 10 {process{
<operate on input item $_>
}} | Out-File $outputReport
All you have to do is to implement the code <operate on input item $_>.
Parallel processing and queueing is done by this command.
UPDATE for the updated question code. Here is the prototype code with some
remarks. They are important. Doing work in parallel is not the same as
directly, there are some rules to follow.
$csv | Split-Pipeline -Count 4 -Load 10, 10 -Variable findSize {process{
# Tips
# - Operate on input object $_, i.e $_.PCname and $_.User
# - Use imported variable $findSize
# - Do not use Write-Host, use (for now) Write-Warning
# - Do not count issues (for now). This is possible but make it working
# without this at first.
# - Do not write data to a file, from several parallel pipelines this
# is not so trivial, just output data, they will be piped further to
# the log file
...
}} | Set-Content $report
# output from all jobs is joined and written to the report file
UPDATE: How to write progress information
SplitPipeline handled pretty well a 800 targets csv, amazing. Is there anyway
to let the user know if the script is alive...? Scan a big csv can take about
20 mins. Something like "in progress 25%","50%","75%"...
There are several options. The simplest is just to invoke Split-Pipeline with
the switch -Verbose. So you will get verbose messages about the progress and
see that the script is alive.
Another simple option is to write and watch verbose messages from the jobs,
e.g. Write-Verbose ... -Verbose which will write messages even if
Split-Pipeline is invoked without Verbose.
And another option is to use proper progress messages with Write-Progress.
See the scripts:
Test-ProgressJobs.ps1
Test-ProgressTotal.ps1
Test-ProgressTotal.ps1 also shows how to use a collector updated from jobs
concurrently. You can use the similar technique for counting issues (the
original question code does this). When all is done show the total number of
issues to a user.