Powershell file parsing very slow - powershell

I am using Powershell 7.
We have the following PowerShell script that will parse some very large file.
I no longer want to use 'Get-Content' as this is to slow.
The script below works, but it takes a very long time to process even a 10 MB file.
I have about 200 files 10MB file with over 10000 lines.
Sample Log:
#Fields:1
#Fields:2
#Fields:3
#Fields:4
#Fields: date-time,connector-id,session-id,sequence-number,local-endpoint,remote-endpoint,event,data,context
2023-01-31T13:53:50.404Z,EXCH1\Relay-EXCH1,08DAD23366676FF1,41,10.10.10.2:25,195.85.212.22:15650,<,DATA,
2023-01-31T13:53:50.404Z,EXCH1\Relay-EXCH1,08DAD23366676FF1,41,10.10.10.2:25,195.85.212.25:15650,<,DATA,
Script:
$Output = #()
$LogFilePath = "C:\LOGS\*.log"
$LogFiles = Get-Item $LogFilePath
$Count = #($logfiles).count
ForEach ($Log in $LogFiles)
{
$Int = $Int + 1
$Percent = $Int/$Count * 100
Write-Progress -Activity "Collecting Log details" -Status "Processing log File $Int of $Count - $LogFile" -PercentComplete $Percent
Write-Host "Processing Log File $Log" -ForegroundColor Magenta
Write-Host
$FileContent = Get-Content $Log | Select-Object -Skip 5
ForEach ($Line IN $FileContent)
{
$Socket = $Line | Foreach {$_.split(",")[5] }
$IP = $Socket.Split(":")[0]
$Output += $IP
}
}
$Output = $Output | Select-Object -Unique
$Output = $Output | Sort-Object
Write-Host "List of noted remove IPs:"
$Output
Write-Host
$Output | Out-File $PWD\Output.txt

As #iRon Suggests the assignment operator (+=) is a lot of overhead. As well as reading entire file to a variable then processing it. Perhaps process it strictly as a pipeline. I achieved same results, using your sample data, with the code written this way below.
$LogFilePath = "C:\LOGS\*.log"
$LogFiles = Get-ChildItem $LogFilePath
$Count = #($logfiles).count
$Output = ForEach($Log in $Logfiles) {
# Code for Write-Progress here
Get-Content -Path $Log.FullName | Select-Object -Skip 5 | ForEach-Object {
$Socket = $_.split(",")[5]
$IP = $Socket.Split(":")[0]
$IP
}
}
$Output = $Output | Select-Object -Unique
$Output = $Output | Sort-Object
Write-Host "List of noted remove IPs:"
$Output

Apart from the notable points in the comments, I believe this question is more suitable to Code Review. Nonetheless, here's my take on this using the StreamReader class:
$LogFilePath = "C:\LOGS\*.log"
$LogFiles = Get-Item -Path $LogFilePath
$OutPut = [System.Collections.ArrayList]::new()
foreach ($log in $LogFiles)
{
$skip = 0
$stop = $false
$stream = [System.IO.StreamReader]::new($log.FullName)
while ($line = $stream.ReadLine())
{
if (-not$stop)
{
if ($skip++ -eq 5)
{
$stop = $true
}
continue
}
elseif ($OutPut.Contains(($IP = ($line -split ',|:')[6])))
{
continue
}
$null = $OutPut.Add($IP)
}
$stream.Close()
$stream.Dispose()
}
# Display OutPut and save to file
Write-Host -Object "List of noted remove IPs:"
$OutPut | Sort-Object | Tee-Object -FilePath "$PWD\Output.txt"
This way you can output unique IP's since it's being handled by an if statement checking against what's in $OutPut; essentially replacing Select-Object -Unique. You should see a speed increase as you're no longer adding to a fixed array (+=), and piping to other cmdlets.

You can combine File.ReadLines with Enumerable.Skip to read your files and skip their first 5 lines. This method is much faster than Get-Content. Then for sorting and getting unique strings at the same time you can use a SortedSet<T>.
You should avoid using Write-Progress as this will slow your script down in Windows PowerShell (this has been fixed in newer versions of PowerShell Core).
Do note that because you're looking to sort the result, all strings must be contained in memory before outputting to a file. This would be much more efficient if sorting was not needed, there you would use a HashSet<T> instead for getting unique values.
Get-Item C:\LOGS\*.log | & {
begin { $set = [Collections.Generic.SortedSet[string]]::new() }
process {
foreach($line in [Linq.Enumerable]::Skip([IO.File]::ReadLines($_.FullName), 5)) {
$null = $set.Add($line.Split(',')[5].Split(':')[0])
}
}
end {
$set
}
} | Set-Content $PWD\Output.txt

Related

output result to a file

$outFile = "C:\PS logs\Outlook_autofill\test.csv"
$tests = Import-Csv -Header username,firstname,surname,pcname,5,6,7,8,9,10,11,12,13,14,15 $outFile |
sort -property #{Expression="username";Descending=$true}, #{Expression="pcname";Descending=$false}
$tests[0]
for ($i=1; $i -le $tests.length -1; $i++)
{
if ($tests[$i]."username" -eq $tests[$i-1]."username" -AND $tests[$i]."pcname" -eq $tests[$i-1]."pcname")
{
continue
}
else {$tests[$i]}
}
I managed to download the code from a site on the Internet and get it working and it appears to do what I would like. However, I am unsure how to output it back into a CSV?
Would I put an output line in the same loop as the continue
thank you kindly for any help.
You can use this construction right inside your loop to add each line to the file you need.
$NewLine = "{0},{1},{2}" -f $ValueForColumn1, $ValueForcolumn2, $ValueForcolumn3
Add-Content -Path $PathToFile -Value $NewLine

Memory exception while filtering large CSV files

getting memory exception while running this code. Is there a way to filter one file at a time and write output and append after processing each file. Seems the below code loads everything to memory.
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Get-ChildItem $inputFolder -File -Filter '*.csv' |
ForEach-Object { Import-Csv $_.FullName } |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
May be can you export and filter your files one by one and append result into your output file like this :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Remove-Item $outputFile -Force -ErrorAction SilentlyContinue
Get-ChildItem $inputFolder -Filter "*.csv" -file | %{import-csv $_.FullName | where machine_type -eq 'workstations' | export-csv $outputFile -Append -notype }
Note: The reason for not using Get-ChildItem ... | Import-Csv ... - i.e., for not directly piping Get-ChildItem to Import-Csv and instead having to call Import-Csv from the script block ({ ... } of an auxiliary ForEach-Object call, is a bug in Windows PowerShell that has since been fixed in PowerShell Core - see the bottom section for a more concise workaround.
However, even output from ForEach-Object script blocks should stream to the remaining pipeline commands, so you shouldn't run out of memory - after all, a salient feature of the PowerShell pipeline is object-by-object processing, which keeps memory use constant, irrespective of the size of the (streaming) input collection.
You've since confirmed that avoiding the aux. ForEach-Object call does not solve the problem, so we still don't know what causes your out-of-memory exception.
Update:
This GitHub issue contains clues as to the reason for excessive memory use, especially with many properties that contain small amounts of data.
This GitHub feature request proposes using strongly typed output objects to help the issue.
The following workaround, which uses the switch statement to process the files as text files, may help:
$header = ''
Get-ChildItem $inputFolder -Filter *.csv | ForEach-Object {
$i = 0
switch -Wildcard -File $_.FullName {
'*workstations*' {
# NOTE: If no other columns contain the word `workstations`, you can
# simplify and speed up the command by omitting the `ConvertFrom-Csv` call
# (you can make the wildcard matching more robust with something
# like '*,workstations,*')
if ((ConvertFrom-Csv "$header`n$_").machine_type -ne 'workstations') { continue }
$_ # row whose 'machine_type' column value equals 'workstations'
}
default {
if ($i++ -eq 0) {
if ($header) { continue } # header already written
else { $header = $_; $_ } # header row of 1st file
}
}
}
} | Set-Content $outputFile
Here's a workaround for the bug of not being able to pipe Get-ChildItem output directly to Import-Csv, by passing it as an argument instead:
Import-Csv -LiteralPath (Get-ChildItem $inputFolder -File -Filter *.csv) |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Note that in PowerShell Core you could more naturally write:
Get-ChildItem $inputFolder -File -Filter *.csv | Import-Csv |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Solution 2 :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8 # modify encoding if necessary
$Delimiter=','
#find header for your files => i take first row of first file with data
$Header = Get-ChildItem -Path $inputFolder -Filter *.csv | Where length -gt 0 | select -First 1 | Get-Content -TotalCount 1
#if not header founded then not file with sise >0 => we quit
if(! $Header) {return}
#create array for header
$HeaderArray=$Header -split $Delimiter -replace '"', ''
#open output file
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
#write header founded
$w.WriteLine($Header)
#loop on file csv
Get-ChildItem $inputFolder -File -Filter "*.csv" | %{
#open file for read
$r = New-Object System.IO.StreamReader($_.fullname, $encoding)
$skiprow = $true
while ($line = $r.ReadLine())
{
#exclude header
if ($skiprow)
{
$skiprow = $false
continue
}
#Get objet for current row with header founded
$Object=$line | ConvertFrom-Csv -Header $HeaderArray -Delimiter $Delimiter
#write in output file for your condition asked
if ($Object.machine_type -eq 'workstations') { $w.WriteLine($line) }
}
$r.Close()
$r.Dispose()
}
$w.close()
$w.Dispose()
You have to read and write to the .csv files one row at a time, using StreamReader and StreamWriter:
$filepath = "C:\Change\2019\October"
$outputfile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8
$files = Get-ChildItem -Path $filePath -Filter *.csv |
Where-Object { $_.machine_type -eq 'workstations' }
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
$skiprow = $false
foreach ($file in $files)
{
$r = New-Object System.IO.StreamReader($file.fullname, $encoding)
while (($line = $r.ReadLine()) -ne $null)
{
if (!$skiprow)
{
$w.WriteLine($line)
}
$skiprow = $false
}
$r.Close()
$r.Dispose()
$skiprow = $true
}
$w.close()
$w.Dispose()
get-content *.csv | add-content combined.csv
Make sure combined.csv doesn't exist when you run this, or it's going to go full Ouroboros.

Not able to skip the 1st iteration in foreach-object using continue statement.Please suggest

Hi i have powershell where i am using a foreach-object and would like to skip the first iteration always. And i am using continue statement as well. But the current behaviour of continue is like break. Please suggest if something i am doing wrong here.
Below is the sample code.
$xmlfile = 'D:\testdirecotry\sample.xml'
[xml]$xmlcontent = (Get-Content $xmlfile)
$folderprefix = 'plm_z'
$regex = '<!--__AMAZONSITE id="(.+?)" instance="(.+?)"__-->'
$i=0
(Get-Content $xmlfile) | select-string -Pattern $regex | ForEach-Object {
write-host "Test Iterartion"
if($i -eq 0)
{
write-host "entering if loop"
write-host $i
$i++
write-host $i
continue
}
else
{
write-host "entering else loop"
write-host $_
$pscustomobject=#(
# write-host $_
$id = $_.Matches.Groups[1].Value
$instance = $_.Matches.Groups[2].Value
write-host "Do Something"
)
}
}
An easier way to skip the first object would potentially be to use Select-Object in the pipeline:
Get-Content $xmlfile |
Select-string -Pattern $regex |
Select-Object -Skip 1 |
ForEach-Object {
...

Delete Files Older Than X Days: Presentation of Data/Logging Files

I'm new to Powershell, I'm creating a code to delete a file/s if more than "x" days.
I'm almost done. Need your help in representing my date (table) and should not produce a log file if no files will be delete.
Here's my code:
$max_days = "-30"
$curr_date = Get-Date
$del_date = $curr_date.AddDays($max_days)
$Path = "C:\Desktop\Code"
$DateTime = Get-Date -Format "D=yyyy-MM-dd_T=HH-mm-ss"
$itemsearch = Get-ChildItem C:\Test -Recurse | Where-Object { $_.LastWriteTime -lt $del_date}
Foreach ($item in $itemsearch)
{
Write "File:", $item.Name "Modified:", $item.LastWriteTime "Path:", $item.FullName "Date Deleted:" $del_date | Out-File "C:\Desktop\Code\Deleted\SFTP_DeleteFiles_WORKSPACE_$DateTime.txt" -append
$item | Remove-Item
}
Can anyone please help me? It's already working by the way.
Just need to present the data in table form and don't create a log file if there's nothing to delete.
Update:
Already solved the condition statement by doing:
if($itemsearch)
{
Foreach ($item in $itemsearch)
{
Write "File:", $item.Name "Modified:", $item.LastWriteTime "Path:", $item.FullName "Date Deleted:" $del_date | Out-File "C:\Desktop\Code\Deleted\SFTP_DeleteFiles_WORKSPACE_$DateTime.txt" -append
$item | Remove-Item
}
}
else
{
Write "No files will be deleted."
}
Thanks!
What I want to display it in Excel/Text file is like this one:
http://i59.tinypic.com/30wv33d.jpg
Anyone?
It returns me with this one:
IsReadOnly;"IsFixedSize";"IsSynchronized";"Keys";"Values";"SyncRoot";"Count"
False;"False";"False";"System.Collections.Hashtable+KeyCollection";"System.Collections.Hashtable+ValueCollection";"System.Object";"4"
False;"False";"False";"System.Collections.Hashtable+KeyCollection";"System.Collections.Hashtable+ValueCollection";"System.Object";"4"
False;"False";"False";"System.Collections.Hashtable+KeyCollection";"System.Collections.Hashtable+ValueCollection";"System.Object";"4"
False;"False";"False";"System.Collections.Hashtable+KeyCollection";"System.Collections.Hashtable+ValueCollection";"System.Object";"4"
In Excel. Do you have any idea? I have to search it though.
To introduce tabular logging I would use a CSV file as output by replacing your foreach block by this code:
$results = #()
foreach ($item in $itemsearch)
{
$success = $true
try
{
$item | Remove-Item
}
catch
{
$success = $false
}
if( $success -eq $true )
{
Write-Host $item.FullName 'successfully deleted.'
$results += [PSCustomObject]#{'File'=$item.Name;'Modified'=$item.LastWriteTime;'Path'=$item.FullName;'Date Deleted'=$del_date;'State'='SUCCESS'}
}
else
{
Write-Host 'Error deleting' $item.FullName
$results += [PSCustomObject]#{'File'=$item.Name;'Modified'=$item.LastWriteTime;'Path'=$item.FullName;'Date Deleted'=$del_date;'State'='ERROR'}
}
}
$results | Export-Csv -Path "C:\Desktop\Code\Deleted\SFTP_DeleteFiles_WORKSPACE_$DateTime.csv" -Encoding UTF8 -Delimiter ';' -NoTypeInformation
First an empty array is created ($results).
The try/catch block is here to detect if the deletion succeeded or not, then the appropriate line is added to $results.
At the end the $results array is exported to CSV with ';' separator so you can open it right away with Excel.

How to have a FOREACH statement fire each time get-counter fires

I have the following code in use:
$Folder="C:\Perflogs\BBCRMLogs" # Change the bit in the quotation marks to whatever directory you want the log file stored in
$Computer = $env:COMPUTERNAME
$1GBInBytes = 1GB
$p = "LOTS OF COUNTERS";
# If you want to change the performance counters, change the above list. However, these are the recommended counters for a client machine.
$dir = test-path $Folder
IF($dir -eq $False)
{
New-Item $Folder -type directory
$num = 0
$file = "$Folder\SQL_log_${num}.csv"
Get-Counter -counter $p -SampleInterval 2 -Continuous |
Foreach {
if ((Get-Item $file).Length -gt 1MB) {
$num +=1;$file = "$Folder\SQL_log_${num}.csv"
}
$_
} |
Export-Counter $file -Force -FileFormat CSV
}
Else
{
$num = 0
$file = "$Folder\SQL_log_${num}.csv"
Get-Counter -counter $p -SampleInterval 2 -Continuous |
Foreach {
if ((Get-Item $file).Length -gt 1MB) {
$num +=1;$file = "$Folder\SQL_log_${num}.csv"
}
$_
} |
Export-Counter $file -Force -FileFormat CSV
}
However, even when ((Get-Item $file).Length -gt 1MB) is TRUE, it doesn't increment the file up. My thought is that the Foreach loop isn't being called during each time the sample is taken, since Get-Counter is just being called once (and then is ongoing). I'm not sure what construct I should be using to make sure that it is passing through that loop. Should I isolate that particular Foreach statement out into another section, rather than relying on it being called during the get-counter? This Powershell script is being called by a Batch file and then the get-counter part runs in the background, collecting information.
The problem is that the $file variable on Export-Counter is only evaluated once when Export-Counter is executed. Pipe the results of Get-Counter to Foreach-Object and export inside of it (forcing $file to re-evaluate) but that will overwrite the output file in each iteration and unfortunately Export-Counter doesn't have an Append switch.
Off the top of my head you could export to csv using Export-Csv, in v3 it supports appending to the file. That said, you won't get you the same csv structure.
Two more things. In the first execution of the script, the first file was not created yet and then you check for it's length. That gives file not found an error, use the ErrorAction parameter to suppress errors.
You don't need to repeat the code twice. Check if the output directory exists and create it if it doesn't exist, then continue once with the rest of the script.
$Folder = 'D:\temp'
$num = 0
$file = "$Folder\SQL_log_${num}.csv"
if( !(test-path $folder)) {New-Item $Folder -type directory}
Get-Counter -counter $p -SampleInterval 2 -Continuous | Foreach {
if ((Get-Item $file -ErrorAction SilentlyContinue ).Length -gt 1mb)
{
$num +=1
$file = "$Folder\SQL_log_${num}.csv"
}
$_
} | Foreach-Object { $_ | Export-Csv $file -Append}