How to speed up my SUPER SLOW search script - powershell

UPDATE (06/21/22): See my updated script below, which utilizes some of the answer.
I am building a script to search for $name through a large batch of CSV files. These files can be as big as 67,000 KB. This is my script that I use to search the files:
Powershell Script
Essentially, I use Import-Csv. I change a few things depending on the file name, however. For example, some files don't have headers, or they may use a different delimiter. Then I store all the matches in $results and then return that variable. This is all put in a function called CSVSearch for ease of running.
#create function called CSV Search
function CSVSearch{
#prompt
$name = Read-Host -Prompt 'Input name'
#set path to root folder
$path = 'Path\to\root\folder\'
#get the file path for each CSV file in root folder
$files = Get-ChildItem $path -Filter *.csv | Select-Object -ExpandProperty FullName
#count files in $files
$filesCount = $files.Count
#create empty array, $results
$results= #()
#count for write-progress
$i = 0
foreach($file in $files){
Write-Progress -Activity "Searching files: $i out of $filesCount searched. $resultsCount match(es) found" -PercentComplete (($i/$files.Count)*100)
#import method changes depending on CSV file name found in $file (headers, delimiters).
if($file -match 'File1*'){$results += Import-Csv $file -Header A, Name, C, D -Delimiter '|' | Select-Object *,#{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
if($file -match 'File2*'){$results += Import-Csv $file -Header A, B, Name -Delimiter '|' | Select-Object *,#{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
if($file -match 'File3*'){$results += Import-Csv $file | Select-Object *,#{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
if($file -match 'File4*'){$results += Import-Csv $file | Select-Object *,#{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
$i++
$resultsCount = $results.Count
}
#if the loop ends and $results array is empty, return "No matches."
if(!$results){Write-Host 'No matches found.' -ForegroundColor Yellow}
#return results stored in $results variable
else{$results
Write-Host $resultsCount 'matches found.' -ForegroundColor Green
Write-Progress -Activity "Completed" -Completed}
}
CSVSearch
Below are what the CSV files look like. Obviously, the amount of the data below is not going to equate to the actual size of the files. But below is the basic structure:
CSV files
File1.csv
1|Moonknight|QWEPP|L
2|Star Wars|QWEPP|T
3|Toy Story|QWEPP|U
File2.csv
JKLH|1|Moonknight
ASDF|2|Star Wars
QWER|3|Toy Story
File3.csv
1,Moonknight,AA,DDD
2,Star Wars,BB,CCC
3,Toy Story,CC,EEE
File4.csv
1,Moonknight,QWE
2,Star Wars,QWE
3,Toy Story,QWE
The script works great. Here is an example of the output I would receive if $name = Moonknight:
Example of results
A : 1
Name : Moonknight
C: QWE
FileName: Path\to\root\folder\File4.csv
A : 1
Name : Moonknight
B : AA
C : DDD
FileName: Path\to\root\folder\File3.csv
A : JKLH
B : 1
Name : Moonknight
FileName: Path\to\root\folder\File2.csv
A : 1
Name : Moonknight
C : QWEPP
D : L
FileName: Path\to\root\folder\File1.csv
4 matches found.
However, it is very slow, and I have a lot of files to search through. Any ideas on how to speed my script up?
Edit: I must mention. I tried importing the data into a hash table and then searching the hash table, but that was much slower.
UPDATED SCRIPT - My Solution (06/21/22):
This update utilizes some of Santiago's script below. I was having a hard time decoding everything he did, as I am new to PowerShell. So I sort of jerry rigged my own solution, that used a lot of his script/ideas.
The one thing that made a huge difference was outputting $results[$i] which returns the most recent match as the script is running. Probably not the most efficient way to do it, but it works for what I'm trying to do. Thanks!
function CSVSearch{
[cmdletbinding()]
param(
[Parameter(Mandatory)]
[string] $Name
)
$files = Get-ChildItem 'Path\to\root\folder\' -Filter *.csv -Recurse | %{$_.FullName}
$results = #()
$i = 0
foreach($file in $files){
if($file -like '*File1*'){$results += Import-Csv $file -Header A, Name, C, D -Delimiter '|' | Where-Object { $_.'Name' -match $Name} | Select-Object *,#{Name='FileName';Expression={$file}}}
if($file -like' *File2*'){$results += Import-Csv $file -Header A, B, Name -Delimiter '|' | Where-Object { $_.'Name' -match $Name} | Select-Object *,#{Name='FileName';Expression={$file}}}
if($file -like '*File3*'){$results += Import-Csv $file | Where-Object { $_.'Name' -match $Name} | Select-Object *,#{Name='FileName';Expression={$file}}}
if($file -like '*File4*'){$results += Import-Csv $file | Where-Object { $_.'Name' -match $Name} | Select-Object *,#{Name='FileName';Expression={$file}}}
$results[$i]
$i++
}
if(-not $results) {
Write-Host 'No matches found.' -ForegroundColor Yellow
return
}
Write-Host "$($results.Count) matches found." -ForegroundColor Green
}

Give this one a try, it should be a bit faster. Select-Object has to reconstruct your object, if you use it before filtering, you're actually recreating your entire CSV, you want to filter first (Where-Object / .Where) before reconstructing it.
.Where should be a faster than Where-Object here, the caveat is that the intrinsic method requires that the collections already exists in memory, there is no pipeline processing and no streaming.
Write-Progress will only slow down your script, better remove it.
Lastly, you can use splatting to avoid having multiple if conditions.
function CSVSearch {
[cmdletbinding()]
param(
[Parameter(Mandatory)]
[string] $Name,
[Parameter()]
[string] $Path = 'Path\to\root\folder\'
)
$param = #{
File1 = #{ Header = 'A', 'Name', 'C', 'D'; Delimiter = '|' }
File2 = #{ Header = 'A', 'B', 'Name' ; Delimiter = '|' }
File3 = #{}; File4 = #{} # File3 & 4 should have headers ?
}
$results = foreach($file in Get-ChildItem . -Filter file*.csv) {
$thisparam = $param[$file.BaseName]
$thisparam['LiteralPath'] = $file.FullName
(Import-Csv #thisparam).where{ $_.Name -match $name } |
Select-Object *, #{Name='FileName';Expression={$file}}
}
if(-not $results) {
Write-Host 'No matches found.' -ForegroundColor Yellow
return
}
Write-Host "$($results.Count) matches found." -ForegroundColor Green
$results
}
CSVSearch -Name Moonknight
If you want the function to stream results as they're found, you can use a Filter, this is a very efficient filtering technique, certainly faster than Where-Object:
function CSVSearch {
[cmdletbinding()]
param(
[Parameter(Mandatory)]
[string] $Name,
[Parameter()]
[string] $Path = 'Path\to\root\folder\'
)
begin {
$param = #{
File1 = #{ Header = 'A', 'Name', 'C', 'D'; Delimiter = '|' }
File2 = #{ Header = 'A', 'B', 'Name' ; Delimiter = '|' }
File3 = #{}; File4 = #{} # File3 & 4 should have headers ?
}
$counter = [ref] 0
filter myFilter {
if($_.Name -match $name) {
$counter.Value++
$_ | Select-Object *, #{N='FileName';E={$file}}
}
}
}
process {
foreach($file in Get-ChildItem $path -Filter *.csv) {
$thisparam = $param[$file.BaseName]
$thisparam['LiteralPath'] = $file.FullName
Import-Csv #thisparam | myFilter
}
}
end {
if(-not $counter.Value) {
Write-Host 'No matches found.' -ForegroundColor Yellow
return
}
Write-Host "$($counter.Value) matches found." -ForegroundColor Green
}
}

Related

Appending file name and last write time as columns in a CSV

I have bunch of text files that I am converting to a CSV.
For example I have a few hundred txt files that look like this
Serial Number : 123456
Measurement : 5
Test Data : 125
And each file is being converted to a single row on the CSV. I can't figured out how to add an additional column for the file name and the last write time.
This is what I currently have that copies all of the data from txt to CSV
$files = "path"
function Get-Data {
param (
[Parameter (Mandatory, ValueFromPipeline, Position=0)] $filename
)
$data=#{}
$lines=Get-Content -LiteralPath $filename | Where-Object {$_ -notlike '*---*'}
foreach ($line in $lines) {
$splitLine=$line.split(":")
$data.Add($splitLine[0],$splitLine[1])
}
return [PSCustomObject]$data
}
$files | Foreach-Object -Process {Get-Data $_} | Export-Csv -Path C:\Scripts\data.csv -NoTypeInformation -Force
I've tried doing this but it doesn't add anything. I might be trying to add the data the wrong way.
$files = "path"
function Get-Data {
param (
[Parameter (Mandatory, ValueFromPipeline, Position=0)] $filename
)
$data=#{}
$name = Get-ChildItem -literalpath $filename | Select Name
$data.Add("Filename", $name)
$lines=Get-Content -LiteralPath $filename | Where-Object {$_ -notlike '*---*'}
foreach ($line in $lines) {
$splitLine=$line.split(":")
$data.Add($splitLine[0],$splitLine[1])
}
return [PSCustomObject]$data
}
$files | Foreach-Object -Process {Get-Data $_} | Export-Csv -Path E:\Scripts\Pico2.csv -NoTypeInformation -Force
Here's a streamlined version of your code that should work as intended:
function Get-Data {
param (
[Parameter (Mandatory, ValueFromPipeline)]
[System.IO.FileInfo] $file # Accept direct output from Get-ChildItem
)
process { # Process each pipeline input object
# Initialize an ordered hashtable with
# the input file's name and its last write time.
$data = [ordered] #{
FileName = $file.Name
LastWriteTime = $file.LastWriteTime
}
# Read the file and parse its lines
# into property name-value pairs to add to the hashtable.
$lines = (Get-Content -ReadCount 0 -LiteralPath $file.FullName) -notlike '*---*'
foreach ($line in $lines) {
$name, $value = ($line -split ':', 2).Trim()
$data.Add($name, $value)
}
# Convert the hashtable to a [pscustomobject] instance
# and output it.
[PSCustomObject] $data
}
}
# Determine the input files via Get-ChildItem and
# pipe them directly to Get-Data, which in turn pipes to Export-Csv.
Get-ChildItem "path" |
Get-Data |
Export-Csv -Path C:\Scripts\data.csv -NoTypeInformation -Force

Memory exception while filtering large CSV files

getting memory exception while running this code. Is there a way to filter one file at a time and write output and append after processing each file. Seems the below code loads everything to memory.
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Get-ChildItem $inputFolder -File -Filter '*.csv' |
ForEach-Object { Import-Csv $_.FullName } |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
May be can you export and filter your files one by one and append result into your output file like this :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Remove-Item $outputFile -Force -ErrorAction SilentlyContinue
Get-ChildItem $inputFolder -Filter "*.csv" -file | %{import-csv $_.FullName | where machine_type -eq 'workstations' | export-csv $outputFile -Append -notype }
Note: The reason for not using Get-ChildItem ... | Import-Csv ... - i.e., for not directly piping Get-ChildItem to Import-Csv and instead having to call Import-Csv from the script block ({ ... } of an auxiliary ForEach-Object call, is a bug in Windows PowerShell that has since been fixed in PowerShell Core - see the bottom section for a more concise workaround.
However, even output from ForEach-Object script blocks should stream to the remaining pipeline commands, so you shouldn't run out of memory - after all, a salient feature of the PowerShell pipeline is object-by-object processing, which keeps memory use constant, irrespective of the size of the (streaming) input collection.
You've since confirmed that avoiding the aux. ForEach-Object call does not solve the problem, so we still don't know what causes your out-of-memory exception.
Update:
This GitHub issue contains clues as to the reason for excessive memory use, especially with many properties that contain small amounts of data.
This GitHub feature request proposes using strongly typed output objects to help the issue.
The following workaround, which uses the switch statement to process the files as text files, may help:
$header = ''
Get-ChildItem $inputFolder -Filter *.csv | ForEach-Object {
$i = 0
switch -Wildcard -File $_.FullName {
'*workstations*' {
# NOTE: If no other columns contain the word `workstations`, you can
# simplify and speed up the command by omitting the `ConvertFrom-Csv` call
# (you can make the wildcard matching more robust with something
# like '*,workstations,*')
if ((ConvertFrom-Csv "$header`n$_").machine_type -ne 'workstations') { continue }
$_ # row whose 'machine_type' column value equals 'workstations'
}
default {
if ($i++ -eq 0) {
if ($header) { continue } # header already written
else { $header = $_; $_ } # header row of 1st file
}
}
}
} | Set-Content $outputFile
Here's a workaround for the bug of not being able to pipe Get-ChildItem output directly to Import-Csv, by passing it as an argument instead:
Import-Csv -LiteralPath (Get-ChildItem $inputFolder -File -Filter *.csv) |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Note that in PowerShell Core you could more naturally write:
Get-ChildItem $inputFolder -File -Filter *.csv | Import-Csv |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Solution 2 :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8 # modify encoding if necessary
$Delimiter=','
#find header for your files => i take first row of first file with data
$Header = Get-ChildItem -Path $inputFolder -Filter *.csv | Where length -gt 0 | select -First 1 | Get-Content -TotalCount 1
#if not header founded then not file with sise >0 => we quit
if(! $Header) {return}
#create array for header
$HeaderArray=$Header -split $Delimiter -replace '"', ''
#open output file
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
#write header founded
$w.WriteLine($Header)
#loop on file csv
Get-ChildItem $inputFolder -File -Filter "*.csv" | %{
#open file for read
$r = New-Object System.IO.StreamReader($_.fullname, $encoding)
$skiprow = $true
while ($line = $r.ReadLine())
{
#exclude header
if ($skiprow)
{
$skiprow = $false
continue
}
#Get objet for current row with header founded
$Object=$line | ConvertFrom-Csv -Header $HeaderArray -Delimiter $Delimiter
#write in output file for your condition asked
if ($Object.machine_type -eq 'workstations') { $w.WriteLine($line) }
}
$r.Close()
$r.Dispose()
}
$w.close()
$w.Dispose()
You have to read and write to the .csv files one row at a time, using StreamReader and StreamWriter:
$filepath = "C:\Change\2019\October"
$outputfile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8
$files = Get-ChildItem -Path $filePath -Filter *.csv |
Where-Object { $_.machine_type -eq 'workstations' }
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
$skiprow = $false
foreach ($file in $files)
{
$r = New-Object System.IO.StreamReader($file.fullname, $encoding)
while (($line = $r.ReadLine()) -ne $null)
{
if (!$skiprow)
{
$w.WriteLine($line)
}
$skiprow = $false
}
$r.Close()
$r.Dispose()
$skiprow = $true
}
$w.close()
$w.Dispose()
get-content *.csv | add-content combined.csv
Make sure combined.csv doesn't exist when you run this, or it's going to go full Ouroboros.

Read and write to the same CSV using PowerShell

I have a CSV file with Name and Status columns. We need to read the Name from the CSV file and perform a set of action on that, once we finish the action on one Name we need to update the status of that Name as 'Completed' on the same .csv file.
Name Status
poc_dev_admin
poc_dev_qa
poc_dev_qa1
poc_dev_qa2
poc_dev_qa3
poc_dev_qa7
poc_dev_qa8
poc_dev_ro
poc_dev_support
poc_dev_test1
poc_dev_test14
poc_dev_test15
poc_dev_test16
After execution of poc_dev_admin the CSV file should be like the following:
Name Status
poc_dev_admin Completed
poc_dev_qa
poc_dev_qa1
poc_dev_qa2
poc_dev_qa3
poc_dev_qa7
poc_dev_qa8
poc_dev_ro
poc_dev_support
poc_dev_test1
poc_dev_test14
poc_dev_test15
poc_dev_test16
I have tried with this logic and it updates Status column for all groups rather than updating it after processing the respective group. May I what change should we make to work this as expected?
$P = Import-Csv -Path c:\test\abintm.csv
foreach ($groupname in $P) {
### Process $groupname####
$newprocessstatus = 'Completed'
$P | Select-Object *, #{n='Status';e={$newprocessstatus}} |
Export-Csv -NoTypeInformation -Path C:\test\abintm.csv
}
Something like this should work for what you described:
$file = 'c:\test\abintm.csv'
$csv = Import-Csv $file
foreach ($row in $csv) {
$groupname = $row.Name
# ...
# process $groupname
# ...
$row.Status = 'Completed'
}
$csv | Export-Csv $file -NoType
If you need to update the CSV before processing the next record you can put the CSV export at the end of the loop (right after setting $row.Status to "Completed").
Modified your code for the use case you've specified,
Changed current item variable in foreach loop to $row instead of $groupname
However, defined the $groupname variable inside the foreach block
$List = Import-Csv -Path "c:\test\abintm.csv"
foreach ($row in $List) {
$groupName = $row.Name
$status = $row.Status
if ($groupName -eq "poc_dev_admin") {
$status = "Completed"
($List | ? { $_.Name -eq $groupName }).Status = $status
}
}
if (($List | ? { $_.Status -eq "Completed" }) -ne $null) {
# $List has been modifed and hence write back the updated $List to the csv file
$List | Export-Csv -Notypeinformation -Path "C:\test\abintm.csv"
}
# else nothing is written

Add Color to a Column in Powershell Results

How can I add a different color to the Name Column. If it's possible will it keep the color if I export it to a txt file?
$time = (Get-Date).AddYears(-2)
Get-ChildItem -Recurse | `Where-Object {$_.LastWriteTime -lt $time} | ft -Wrap
Directory,Name,LastWriteTime | Out-File spacetest.txt
Thanks
Take a look at (Get-Host).UI.RawUI.ForegroundColor.
And no, you can not save color to a text file.
You can save color into XML file, HTML file or use Excel or Word Automation to create appropriate files that do support color.
Communary.ConsoleExtensions [link] might help you
Invoke-ColorizedFileListing C:\Windows -m *.dmp
The above command will colorise file types and highlight dump files.
To save a color output, you would have to save to a format that preserves color, like RTF, or HTML. Txt (plain text file) only stores text.
The code below will save your output as an html file.
$time = (Get-Date).AddYears(-2)
Get-ChildItem -Recurse | Where-Object {$_.LastWriteTime -lt $time} |
Select Directory,Name,LastWriteTime |
ConvertTo-Html -Title "Services" -Body "<H2>The result of Get-ChildItem</H2> " -Property Directory,Name,LastWriteTime |
ForEach-Object {
if ($_ -like '<tr><td>*') {
$_ -replace '^(.*?)(<td>.*?</td>)<td>(.*?)</td>(.*)','$1$2<td><font color="green">$3</font></td>$4'
} else {
$_
}
} | Set-Content "$env:TEMP\ColorDirList.html" -Force
The line:
if ($_ -like '<tr><td>*') {
...checks for line in the html output that is a table row.
The line:
$_ -replace '^(.*?)(<td>.*?</td>)<td>(.*?)</td>(.*)','$1$2<td><font color="green">$3</font></td>$4'
...uses a RegEx to replace the 2nd table cell contents with a font tag with the color green. This is a very simple RegEx search & replace that will only color the 2nd column.
And here's another implementation of console only coloring, based on this link
$linestocolor = #(
'CSName Version OSArchitecture'
'------ ------- --------------'
'BENDER 6.1.7601 64-bit '
'LEELA 6.1.7601 64-bit '
'FRY 6.1.7600 64-bit '
'FARNSWORTH 6.1.7601 32-bit '
)
# http://www.bgreco.net/powershell/format-color/
function Format-Color {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline=$true,Mandatory=$true)]
$ToColorize
, [hashtable]$Colors=#{}
, [switch]$SimpleMatch
, [switch]$FullLine
)
Process {
$lines = ($ToColorize | Out-String).Trim() -replace "`r", "" -split "`n"
foreach($line in $lines) {
$color = ''
foreach($pattern in $Colors.Keys){
if (!$SimpleMatch -and !$FullLine -and $line -match "([\s\S]*?)($pattern)([\s\S]*)") { $color = $Colors[$pattern] }
elseif (!$SimpleMatch -and $line -match $pattern) { $color = $Colors[$pattern] }
elseif ($SimpleMatch -and $line -like $pattern) { $color = $Colors[$pattern] }
}
if ($color -eq '') { Write-Host $line }
elseif ($FullLine -or $SimpleMatch) { Write-Host $line -ForegroundColor $color }
else {
Write-Host $Matches[1] -NoNewline
Write-Host $Matches[2] -NoNewline -ForegroundColor $color
Write-Host $Matches[3]
}
}
}
}
$linestocolor | Format-Color -Colors #{'6.1.7600' = 'Red'; '32-bit' = 'Green'}
# doesn't work...
# (Get-ChildItem | Format-Table -AutoSize) | Format-Color -Colors #{'sql' = 'Red'; '08/07/2016' = 'Green'}
# does work...
Format-Color -ToColorize (Get-ChildItem | Format-Table -AutoSize) -Colors #{'sql' = 'Red'; '08/07/2016' = 'Green'}
return

Comparing csv files with -like in Powershell

I have two csv files, each that contain a PATH column. For example:
CSV1.csv
PATH,Data,NF
\\server1\folderA,1,1
\\server1\folderB,1,1
\\server2\folderA,1,1
\\server2\folderB,1,1
CSV2.csv
PATH,User,Access,Size
\\server1\folderA\file1,don,1
\\server1\folderA\file2,don,1
\\server1\folderA\file3,sue,1
\\server2\folderB\file1,don,1
What I'm attempting to do is create a script that will result in separate csv exports based on the paths in CSV1 such that the new files contain file values from CSV2 that match. For example, from the above, I'd end up with 2 results:
result1.csv
\\server1\folderA\file1,don,1
\\server1\folderA\file2,don,1
\\server1\folderA\file3,sue,1
result2.csv
\\server2\folderB\file1,don,1
Previously I've used a script lime this when the two values are exact:
$reportfile = import-csv $apireportoutputfile -delimiter ';' -encoding unicode
$masterlist = import-csv $pathlistfile
foreach ($record in $masterlist)
{
$path=$record.Path
$filename = $path -replace '\\','_'
$filename = '.\Working\sharefiles\' + $filename + '.csv'
$reportfile | where-object {$_.path -eq $path} | select FilePath,UserName,LastAccessDate,LogicalSize | export-csv -path $filename
write-host " Creating files list for $path" -foregroundcolor red -backgroundcolor white
}
however since the two path values are not the same, it returns nothing. I found a -like operator but am not sure how to use it in this code to get the results I want. where-object is a filter while -like ends up returning a true/false. Am I on the right track? Any ideas for a solution?
Something like this, maybe?
$ht = #{}
Import-Csv csv1.csv |
foreach { $ht[$_.path] = New-Object collections.arraylist }
Import-Csv csv2.csv |
foreach {
$path = $_.path | Split-Path -Parent
$ht[$path].Add($_) > $null
}
$i=1
$ht.Values |
foreach { if ($_.count)
{
$_ | Export-Csv "result$i.csv" -NoTypeInformation
$i++
}
}
My suggestion:
$1=ipcsv .\csv1.CSV
$2=ipcsv .\csv2.CSV
$equal = diff ($2|select #{n='PATH';e={Split-Path $_.PATH}}) $1 -Property PATH -IncludeEqual -ExcludeDifferent -PassThru
0..(-1 + $equal.Count) | %{%{$i = $_}{
$2 | ?{ (Split-Path $_.PATH) -eq $equal[$i].PATH } | epcsv ".\Result$i.CSV"
}}