Appending file name and last write time as columns in a CSV - powershell

I have bunch of text files that I am converting to a CSV.
For example I have a few hundred txt files that look like this
Serial Number : 123456
Measurement : 5
Test Data : 125
And each file is being converted to a single row on the CSV. I can't figured out how to add an additional column for the file name and the last write time.
This is what I currently have that copies all of the data from txt to CSV
$files = "path"
function Get-Data {
param (
[Parameter (Mandatory, ValueFromPipeline, Position=0)] $filename
)
$data=#{}
$lines=Get-Content -LiteralPath $filename | Where-Object {$_ -notlike '*---*'}
foreach ($line in $lines) {
$splitLine=$line.split(":")
$data.Add($splitLine[0],$splitLine[1])
}
return [PSCustomObject]$data
}
$files | Foreach-Object -Process {Get-Data $_} | Export-Csv -Path C:\Scripts\data.csv -NoTypeInformation -Force
I've tried doing this but it doesn't add anything. I might be trying to add the data the wrong way.
$files = "path"
function Get-Data {
param (
[Parameter (Mandatory, ValueFromPipeline, Position=0)] $filename
)
$data=#{}
$name = Get-ChildItem -literalpath $filename | Select Name
$data.Add("Filename", $name)
$lines=Get-Content -LiteralPath $filename | Where-Object {$_ -notlike '*---*'}
foreach ($line in $lines) {
$splitLine=$line.split(":")
$data.Add($splitLine[0],$splitLine[1])
}
return [PSCustomObject]$data
}
$files | Foreach-Object -Process {Get-Data $_} | Export-Csv -Path E:\Scripts\Pico2.csv -NoTypeInformation -Force

Here's a streamlined version of your code that should work as intended:
function Get-Data {
param (
[Parameter (Mandatory, ValueFromPipeline)]
[System.IO.FileInfo] $file # Accept direct output from Get-ChildItem
)
process { # Process each pipeline input object
# Initialize an ordered hashtable with
# the input file's name and its last write time.
$data = [ordered] #{
FileName = $file.Name
LastWriteTime = $file.LastWriteTime
}
# Read the file and parse its lines
# into property name-value pairs to add to the hashtable.
$lines = (Get-Content -ReadCount 0 -LiteralPath $file.FullName) -notlike '*---*'
foreach ($line in $lines) {
$name, $value = ($line -split ':', 2).Trim()
$data.Add($name, $value)
}
# Convert the hashtable to a [pscustomobject] instance
# and output it.
[PSCustomObject] $data
}
}
# Determine the input files via Get-ChildItem and
# pipe them directly to Get-Data, which in turn pipes to Export-Csv.
Get-ChildItem "path" |
Get-Data |
Export-Csv -Path C:\Scripts\data.csv -NoTypeInformation -Force

Related

How to speed up my SUPER SLOW search script

UPDATE (06/21/22): See my updated script below, which utilizes some of the answer.
I am building a script to search for $name through a large batch of CSV files. These files can be as big as 67,000 KB. This is my script that I use to search the files:
Powershell Script
Essentially, I use Import-Csv. I change a few things depending on the file name, however. For example, some files don't have headers, or they may use a different delimiter. Then I store all the matches in $results and then return that variable. This is all put in a function called CSVSearch for ease of running.
#create function called CSV Search
function CSVSearch{
#prompt
$name = Read-Host -Prompt 'Input name'
#set path to root folder
$path = 'Path\to\root\folder\'
#get the file path for each CSV file in root folder
$files = Get-ChildItem $path -Filter *.csv | Select-Object -ExpandProperty FullName
#count files in $files
$filesCount = $files.Count
#create empty array, $results
$results= #()
#count for write-progress
$i = 0
foreach($file in $files){
Write-Progress -Activity "Searching files: $i out of $filesCount searched. $resultsCount match(es) found" -PercentComplete (($i/$files.Count)*100)
#import method changes depending on CSV file name found in $file (headers, delimiters).
if($file -match 'File1*'){$results += Import-Csv $file -Header A, Name, C, D -Delimiter '|' | Select-Object *,#{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
if($file -match 'File2*'){$results += Import-Csv $file -Header A, B, Name -Delimiter '|' | Select-Object *,#{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
if($file -match 'File3*'){$results += Import-Csv $file | Select-Object *,#{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
if($file -match 'File4*'){$results += Import-Csv $file | Select-Object *,#{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
$i++
$resultsCount = $results.Count
}
#if the loop ends and $results array is empty, return "No matches."
if(!$results){Write-Host 'No matches found.' -ForegroundColor Yellow}
#return results stored in $results variable
else{$results
Write-Host $resultsCount 'matches found.' -ForegroundColor Green
Write-Progress -Activity "Completed" -Completed}
}
CSVSearch
Below are what the CSV files look like. Obviously, the amount of the data below is not going to equate to the actual size of the files. But below is the basic structure:
CSV files
File1.csv
1|Moonknight|QWEPP|L
2|Star Wars|QWEPP|T
3|Toy Story|QWEPP|U
File2.csv
JKLH|1|Moonknight
ASDF|2|Star Wars
QWER|3|Toy Story
File3.csv
1,Moonknight,AA,DDD
2,Star Wars,BB,CCC
3,Toy Story,CC,EEE
File4.csv
1,Moonknight,QWE
2,Star Wars,QWE
3,Toy Story,QWE
The script works great. Here is an example of the output I would receive if $name = Moonknight:
Example of results
A : 1
Name : Moonknight
C: QWE
FileName: Path\to\root\folder\File4.csv
A : 1
Name : Moonknight
B : AA
C : DDD
FileName: Path\to\root\folder\File3.csv
A : JKLH
B : 1
Name : Moonknight
FileName: Path\to\root\folder\File2.csv
A : 1
Name : Moonknight
C : QWEPP
D : L
FileName: Path\to\root\folder\File1.csv
4 matches found.
However, it is very slow, and I have a lot of files to search through. Any ideas on how to speed my script up?
Edit: I must mention. I tried importing the data into a hash table and then searching the hash table, but that was much slower.
UPDATED SCRIPT - My Solution (06/21/22):
This update utilizes some of Santiago's script below. I was having a hard time decoding everything he did, as I am new to PowerShell. So I sort of jerry rigged my own solution, that used a lot of his script/ideas.
The one thing that made a huge difference was outputting $results[$i] which returns the most recent match as the script is running. Probably not the most efficient way to do it, but it works for what I'm trying to do. Thanks!
function CSVSearch{
[cmdletbinding()]
param(
[Parameter(Mandatory)]
[string] $Name
)
$files = Get-ChildItem 'Path\to\root\folder\' -Filter *.csv -Recurse | %{$_.FullName}
$results = #()
$i = 0
foreach($file in $files){
if($file -like '*File1*'){$results += Import-Csv $file -Header A, Name, C, D -Delimiter '|' | Where-Object { $_.'Name' -match $Name} | Select-Object *,#{Name='FileName';Expression={$file}}}
if($file -like' *File2*'){$results += Import-Csv $file -Header A, B, Name -Delimiter '|' | Where-Object { $_.'Name' -match $Name} | Select-Object *,#{Name='FileName';Expression={$file}}}
if($file -like '*File3*'){$results += Import-Csv $file | Where-Object { $_.'Name' -match $Name} | Select-Object *,#{Name='FileName';Expression={$file}}}
if($file -like '*File4*'){$results += Import-Csv $file | Where-Object { $_.'Name' -match $Name} | Select-Object *,#{Name='FileName';Expression={$file}}}
$results[$i]
$i++
}
if(-not $results) {
Write-Host 'No matches found.' -ForegroundColor Yellow
return
}
Write-Host "$($results.Count) matches found." -ForegroundColor Green
}
Give this one a try, it should be a bit faster. Select-Object has to reconstruct your object, if you use it before filtering, you're actually recreating your entire CSV, you want to filter first (Where-Object / .Where) before reconstructing it.
.Where should be a faster than Where-Object here, the caveat is that the intrinsic method requires that the collections already exists in memory, there is no pipeline processing and no streaming.
Write-Progress will only slow down your script, better remove it.
Lastly, you can use splatting to avoid having multiple if conditions.
function CSVSearch {
[cmdletbinding()]
param(
[Parameter(Mandatory)]
[string] $Name,
[Parameter()]
[string] $Path = 'Path\to\root\folder\'
)
$param = #{
File1 = #{ Header = 'A', 'Name', 'C', 'D'; Delimiter = '|' }
File2 = #{ Header = 'A', 'B', 'Name' ; Delimiter = '|' }
File3 = #{}; File4 = #{} # File3 & 4 should have headers ?
}
$results = foreach($file in Get-ChildItem . -Filter file*.csv) {
$thisparam = $param[$file.BaseName]
$thisparam['LiteralPath'] = $file.FullName
(Import-Csv #thisparam).where{ $_.Name -match $name } |
Select-Object *, #{Name='FileName';Expression={$file}}
}
if(-not $results) {
Write-Host 'No matches found.' -ForegroundColor Yellow
return
}
Write-Host "$($results.Count) matches found." -ForegroundColor Green
$results
}
CSVSearch -Name Moonknight
If you want the function to stream results as they're found, you can use a Filter, this is a very efficient filtering technique, certainly faster than Where-Object:
function CSVSearch {
[cmdletbinding()]
param(
[Parameter(Mandatory)]
[string] $Name,
[Parameter()]
[string] $Path = 'Path\to\root\folder\'
)
begin {
$param = #{
File1 = #{ Header = 'A', 'Name', 'C', 'D'; Delimiter = '|' }
File2 = #{ Header = 'A', 'B', 'Name' ; Delimiter = '|' }
File3 = #{}; File4 = #{} # File3 & 4 should have headers ?
}
$counter = [ref] 0
filter myFilter {
if($_.Name -match $name) {
$counter.Value++
$_ | Select-Object *, #{N='FileName';E={$file}}
}
}
}
process {
foreach($file in Get-ChildItem $path -Filter *.csv) {
$thisparam = $param[$file.BaseName]
$thisparam['LiteralPath'] = $file.FullName
Import-Csv #thisparam | myFilter
}
}
end {
if(-not $counter.Value) {
Write-Host 'No matches found.' -ForegroundColor Yellow
return
}
Write-Host "$($counter.Value) matches found." -ForegroundColor Green
}
}

Memory exception while filtering large CSV files

getting memory exception while running this code. Is there a way to filter one file at a time and write output and append after processing each file. Seems the below code loads everything to memory.
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Get-ChildItem $inputFolder -File -Filter '*.csv' |
ForEach-Object { Import-Csv $_.FullName } |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
May be can you export and filter your files one by one and append result into your output file like this :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Remove-Item $outputFile -Force -ErrorAction SilentlyContinue
Get-ChildItem $inputFolder -Filter "*.csv" -file | %{import-csv $_.FullName | where machine_type -eq 'workstations' | export-csv $outputFile -Append -notype }
Note: The reason for not using Get-ChildItem ... | Import-Csv ... - i.e., for not directly piping Get-ChildItem to Import-Csv and instead having to call Import-Csv from the script block ({ ... } of an auxiliary ForEach-Object call, is a bug in Windows PowerShell that has since been fixed in PowerShell Core - see the bottom section for a more concise workaround.
However, even output from ForEach-Object script blocks should stream to the remaining pipeline commands, so you shouldn't run out of memory - after all, a salient feature of the PowerShell pipeline is object-by-object processing, which keeps memory use constant, irrespective of the size of the (streaming) input collection.
You've since confirmed that avoiding the aux. ForEach-Object call does not solve the problem, so we still don't know what causes your out-of-memory exception.
Update:
This GitHub issue contains clues as to the reason for excessive memory use, especially with many properties that contain small amounts of data.
This GitHub feature request proposes using strongly typed output objects to help the issue.
The following workaround, which uses the switch statement to process the files as text files, may help:
$header = ''
Get-ChildItem $inputFolder -Filter *.csv | ForEach-Object {
$i = 0
switch -Wildcard -File $_.FullName {
'*workstations*' {
# NOTE: If no other columns contain the word `workstations`, you can
# simplify and speed up the command by omitting the `ConvertFrom-Csv` call
# (you can make the wildcard matching more robust with something
# like '*,workstations,*')
if ((ConvertFrom-Csv "$header`n$_").machine_type -ne 'workstations') { continue }
$_ # row whose 'machine_type' column value equals 'workstations'
}
default {
if ($i++ -eq 0) {
if ($header) { continue } # header already written
else { $header = $_; $_ } # header row of 1st file
}
}
}
} | Set-Content $outputFile
Here's a workaround for the bug of not being able to pipe Get-ChildItem output directly to Import-Csv, by passing it as an argument instead:
Import-Csv -LiteralPath (Get-ChildItem $inputFolder -File -Filter *.csv) |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Note that in PowerShell Core you could more naturally write:
Get-ChildItem $inputFolder -File -Filter *.csv | Import-Csv |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Solution 2 :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8 # modify encoding if necessary
$Delimiter=','
#find header for your files => i take first row of first file with data
$Header = Get-ChildItem -Path $inputFolder -Filter *.csv | Where length -gt 0 | select -First 1 | Get-Content -TotalCount 1
#if not header founded then not file with sise >0 => we quit
if(! $Header) {return}
#create array for header
$HeaderArray=$Header -split $Delimiter -replace '"', ''
#open output file
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
#write header founded
$w.WriteLine($Header)
#loop on file csv
Get-ChildItem $inputFolder -File -Filter "*.csv" | %{
#open file for read
$r = New-Object System.IO.StreamReader($_.fullname, $encoding)
$skiprow = $true
while ($line = $r.ReadLine())
{
#exclude header
if ($skiprow)
{
$skiprow = $false
continue
}
#Get objet for current row with header founded
$Object=$line | ConvertFrom-Csv -Header $HeaderArray -Delimiter $Delimiter
#write in output file for your condition asked
if ($Object.machine_type -eq 'workstations') { $w.WriteLine($line) }
}
$r.Close()
$r.Dispose()
}
$w.close()
$w.Dispose()
You have to read and write to the .csv files one row at a time, using StreamReader and StreamWriter:
$filepath = "C:\Change\2019\October"
$outputfile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8
$files = Get-ChildItem -Path $filePath -Filter *.csv |
Where-Object { $_.machine_type -eq 'workstations' }
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
$skiprow = $false
foreach ($file in $files)
{
$r = New-Object System.IO.StreamReader($file.fullname, $encoding)
while (($line = $r.ReadLine()) -ne $null)
{
if (!$skiprow)
{
$w.WriteLine($line)
}
$skiprow = $false
}
$r.Close()
$r.Dispose()
$skiprow = $true
}
$w.close()
$w.Dispose()
get-content *.csv | add-content combined.csv
Make sure combined.csv doesn't exist when you run this, or it's going to go full Ouroboros.

Powershell foreach loop for csv files and save to different folder

I am trying to read multiple csv in a folder and trying to replace few unwanted characters and saving it to another csv with same name in a different folder.
I tried using foreach loop and ForEach-Object loop to dynamically generate destination file name
#CSV's in folder (ABC.csv, DEF.csv)
param ( [String] $Filename))
$SourceFolderPath = Get-Childitem "C:\data\"
$DestinationFolderPath = "C:\data\cleaned\"
$Source = $SourceFolderPath + $FileName
$Destination = $DestinationFolderPath + $Filename
ForEach ($f in $SourceFolderPath) {$F}
{
Import-Csv $f | ForEach-Object -Begin { $writeHeader = $True } {
if ($writeHeader) { $writeHeader = $False; $_.psobject.properties.Name -join ',' }
$_.psobject.properties.Value -replace ',', '' -replace '"', '' -join ','
} | set-Content $Destination
}
The code is running if I manually pass $FileName = "ABC.csv". But I need it to dynamically do this for all the files in the folder.
So you want to clean all column data from contained commas and
also remove all double quotes from the resulting csv files?
I suggest to define defaults for the param strings,
so you can supply other values if desired, but you don't have to.
The script temporarily changes the delimiter to a vertical bar | ,
so this shouldn't occur in the file content.
## Q:\Test\2019\08\27\SO_57678154.ps1
param ( [String] $Filename = '*.csv',
[String] $SourceDir = 'C:\Data\',
[String] $TargetDir = 'C:\Data\cleaned\'
)
ForEach ($csvFile in Get-ChildItem -Path $SourceDir -Filter $Filename){
(Import-Csv $csvFile |
ConvertTo-Csv -NoTypeInformation -Delimiter '|') -replace '[,"]' -replace '\|',',' |
Set-Content (Join-Path $TargetDir $csvFile.Name)
}

Perform function for all c:\users\*\AppData\Local

Got the following script that I thought would happily update the specified .ini file in each C:\users\*\AppData\Local\Greeentram folder individually.
function Set-OrAddIniValue {
Param(
[string]$FilePath,
[hashtable]$keyValueList
)
$content = Get-Content $FilePath
$keyValueList.GetEnumerator() | ForEach-Object {
if ($content -match "^$($_.Key)=") {
$content= $content -replace "^$($_.Key)=(.*)", "$($_.Key)=$($_.Value)"
} else {
$content += "$($_.Key)=$($_.Value)"
}
}
$content | Set-Content $FilePath
}
Set-OrAddIniValue -FilePath "C:\Users\*\AppData\Local\Greentram\SDA_Apps.ini" -keyValueList #{
UserName = "Dcebtcv7[[G"
UserEmail = "x}tpwpjmkxmvkYjmklzmx7zv7lr"
UserNo = "*++*(+"
UserKey = "^X(_0[*_/0L)\_0,U,-"
KEM = "H10"
}
What it seems to be doing is somehow combining all the .INI files together and creating a new .INI file for each user.
I have wrongly assumed that C:\Users\*\AppData\Local\Greentram\SDA_Apps.ini would work.
I only want to update or add these specific values to each .INI file.
Set-OrAddIniValue -FilePath "C:\Users\*\AppData\Local\Greentram\SDA_Apps.ini" -keyValueList #{
UserName = "Dcebtcv7[[G"
UserEmail = "x}tpwpjmkxmvkYjmklzmx7zv7lr"
UserNo = "*++*(+"
UserKey = "^X(_0[*_/0L)\_0,U,-"
KEM = "H10"
}
Your function Set-OrAddIniValue doesn't handle wildcards in paths.
$content = Get-Content $FilePath
...
$content | Set-Content $FilePath
The first statement reads the content of all matching files into a single array. The second statement then writes the entire modified content to all matching files. (How would it decide which content belongs to which file?)
You can either call your function for each file individually:
Get-ChildItem "C:\Users\*\AppData\Local\Greentram\SDA_Apps.ini" | ForEach-Object {
Set-OrAddIniValue -FilePath $_.FullName -keyValueList ...
}
or change your function so that it does the enumeration internally:
function Set-OrAddIniValue {
Param(
[string]$FilePath,
[hashtable]$keyValueList
)
Get-ChildItem $FilePath | Where-Object {
-not $_.PSIsContainer # process only files
} | ForEach-Object {
$file = $_.FullName
$content = Get-Content $file
...
$content | Set-Content $file
}
}
On PowerShell v3 and newer you can use Get-ChildItem -File instead of piping the object list through Where-Object {-not $_.PSIsContainer}.

Using Powershell to replace multiple strings in multiple files & folders

I have a list of strings in a CSV file. The format is:
OldValue,NewValue
223134,875621
321321,876330
....
and the file contains a few hundred rows (each OldValue is unique). I need to process changes over a number of text files in a number of folders & subfolders. My best guess of the number of folders, files, and lines of text are - 15 folders, around 150 text files in each folder, with approximately 65,000 lines of text in each folder (between 400-500 lines per text file).
I will make 2 passes at the data, unless I can do it in one. First pass is to generate a text file I will use as a check list to review my changes. Second pass is to actually make the change in the file. Also, I only want to change the text files where the string occurs (not every file).
I'm using the following Powershell script to go through the files & produce a list of the changes needed. The script runs, but is beyond slow. I haven't worked on the replace logic yet, but I assume it will be similar to what I've got.
# replace a string in a file with powershell
[reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic") | Out-Null
Function Search {
# Parameters $Path and $SearchString
param ([Parameter(Mandatory=$true, ValueFromPipeline = $true)][string]$Path,
[Parameter(Mandatory=$true)][string]$SearchString
)
try {
#.NET FindInFiles Method to Look for file
[Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles(
$Path,
[Microsoft.VisualBasic.FileIO.SearchOption]::SearchAllSubDirectories,
$SearchString
)
} catch { $_ }
}
if (Test-Path "C:\Work\ListofAllFilenamesToSearch.txt") { # if file exists
Remove-Item "C:\Work\ListofAllFilenamesToSearch.txt"
}
if (Test-Path "C:\Work\FilesThatNeedToBeChanged.txt") { # if file exists
Remove-Item "C:\Work\FilesThatNeedToBeChanged.txt"
}
$filefolder1 = "C:\TestFolder\WorkFiles"
$ftype = "*.txt"
$filenames1 = Search $filefolder1 $ftype
$filenames1 | Out-File "C:\Work\ListofAllFilenamesToSearch.txt" -Width 2000
if (Test-Path "C:\Work\FilesThatNeedToBeChanged.txt") { # if file exists
Remove-Item "C:\Work\FilesThatNeedToBeChanged.txt"
}
(Get-Content "C:\Work\NumberXrefList.CSV" |where {$_.readcount -gt 1}) | foreach{
$OldFieldValue, $NewFieldValue = $_.Split("|")
$filenamelist = (Get-Content "C:\Work\ListofAllFilenamesToSearch.txt" -ReadCount 5) #|
foreach ($j in $filenamelist) {
#$testvar = (Get-Content $j )
#$testvar = (Get-Content $j -ReadCount 100)
$testvar = (Get-Content $j -Delimiter "\n")
Foreach ($i in $testvar)
{
if ($i -imatch $OldFieldValue) {
$j + "|" + $OldFieldValue + "|" + $NewFieldValue | Out-File "C:\Work\FilesThatNeedToBeChanged.txt" -Width 2000 -Append
}
}
}
}
$FileFolder = (Get-Content "C:\Work\FilesThatNeedToBeChanged.txt" -ReadCount 5)
Get-ChildItem $FileFolder -Recurse |
select -ExpandProperty fullname |
foreach {
if (Select-String -Path $_ -SimpleMatch $OldFieldValue -Debug -Quiet) {
(Get-Content $_) |
ForEach-Object {$_ -replace $OldFieldValue, $NewFieldValue }|
Set-Content $_ -WhatIf
}
}
In the code above, I've tried several things with Get-Content - default, with -ReadCount, and -Delimiter - in an attempt to avoid an out of memory error.
The only thing I have control over is the length of the old & new replacement strings file. Is there a way to do this in Powershell? Is there a better option/solution? I'm running Windows 7, Powershell version 3.0.
Your main problem is that you're reading the file over and over again to change each of the terms. You need to invert the looping of the replace terms and looping of the files. Also, pre-load the csv. Something like:
$filefolder1 = "C:\TestFolder\WorkFiles"
$ftype = "*.txt"
$filenames = gci -Path $filefolder1 -Filter $ftype -Recurse
$replaceValues = Import-Csv -Path "C:\Work\NumberXrefList.CSV"
foreach ($file in $filenames) {
$contents = Get-Content -Path $file
foreach ($replaceValue in $replaceValues) {
$contents = $contents -replace $replaceValue.OldValue, $replaceValue.NewValue
}
Copy-Item $file "$file.old"
Set-Content -Path $file -Value $contents
}