More efficient way of using Import-CSV in powershell - powershell

I am trying to figure out a more efficient way of using Import-CSV (powershell) to place values into an array of csv files. The problem is that some of these files have several hundred thousand lines and running this script in conjunction with other lines of code is what appears to be a big bottle neck. Do you guys have any suggestions of how to make this code more efficient and faster?
foreach($csv in $csvfiles)
{
$csvname = $csv.name;
$paygroup = $csvname.substring(4,3);
$batch = $csvname.substring(14,4);
write-host "Writing $csvname";
$csvimportdata = Import-CSV $CurrentPath"\$csvname";
foreach($record in $csvimportdata)
{
$record.chartfield1 = $paygroup;
$record.chartfield2 = $batch;
$record.chartfield3 = $record.line_descr.substring(0,6);
}
$csvimportdata | Export-CSV $CurrentPath"\$csvname" -NoTypeInformation
};

If your CSVs are large then loading into memory is probably not a good idea. How about something like this:
foreach($csv in $csvfiles)
{
$csvname = $csv.name
$paygroup = $csvname.substring(4,3)
$batch = $csvname.substring(14,4)
write-host "Writing $csvname"
Get-Content $CurrentPath"\$csvname" -Readcount 1 | % {
# Regex below assumes a three column CSV
$_ -replace '^([^,]+,[^,]+,[^,]{6}).*$', '$1'
} | Set-Content $CurrentPath"\$csvname"
}

Related

Changing multiple lines in a text file based on a psobject

I'm working on a script which will add some additional informations to a txt file. These informations are stored in a CSV file which looks like this (the data will differs each time the script will launch):
Number;A;B;ValueOfB
FP01340/05/20;0;1;GTU_01,GTU_03
FP01342/05/20;1;0;GTU01
The txt file looks like this (data inside will of course differ each time):
1|1|FP01340/05/20|2020-05-02|2020-05-02|2020-05-02|166,91|203,23|36,32|nothing interesting 18|33333|63-111 somewhere|||||
2|zwol|9,00|9,00|0,00
2|23|157,91|194,23|36,32
1|1|FP01341/05/20|2020-05-02|2020-05-02|2020-05-02|12,19|14,99|2,80|Some info |2222222|blabla|11-111 something||||
2|23|12,19|14,99|2,80
1|1|FP01342/05/20|2020-05-02|2020-05-02|2020-05-02|525,36|589,64|64,28|bla|222222|blba 36||62030|something||
2|5|213,93|224,63|10,70
2|8|120,34|129,97|9,63
2|23|191,09|235,04|43,95
What I need to do is to find a line which contains 'Number' and then add value 'A' and 'B' from a CSV in a form: |0|1 and then on the first line below, at the end, add 'ValueofB' in a form |AAA_01,AAA_03
So the first two lines should look like this at the end:
1|1|FP01340/05/20|2020-05-02|2020-05-02|2020-05-02|166,91|203,23|36,32|nothing interesting 18|33333|63-111 somewhere||||||0|1
2|zwol|9,00|9,00|0,00|AAA_01,AAA_03
2|23|157,91|194,23|36,32
Rest of lines should not be touched.
I made a script which uses select-string method with context to find what I need to - put that into an object and then add to previously found strings what I need to and put that in to an another object.
My script is as follws:
$csvFile = Import-Csv -Path Somepath\file.csv -Delimiter ";"
$file = "Somepath2\SomeName.txt"
$LinesToChange = #()
$script:LinesToChange = $LinesToChange
$LinesOriginal = #()
$script:LinesOriginal = $LinesOriginal
foreach ($line in $csvFile) {
Select-String -Path $file -Pattern "$($Line.number)" -Encoding default -Context 0, 1 | ForEach-Object {
$1 = $_.Line
$2 = $_.Context.PostContext
}
$ListOrg = [pscustomobject]#{
Line_org = $1
Line_GTU_org = $2
}
$LinesOriginal = $LinesOriginal + $ListOrg
$lineNew = $ListOrg.Line_org | foreach { $_ + "|$($line.A)|$($line.B)" }
$GTUNew = $ListOrg.Line_GTU_org | foreach { $_ + "|$($line.ValueofB)" }
$ListNew = [pscustomobject]#{
Line_new = $lineNew
Line_GTU_new = $GTUNew
Line_org = $ListOrg.Line_org
Line_GTU_org = $ListOrg.Line_GTU_org
}
$LinesToChange = $LinesToChange + $ListNew
}
The output is an object $LinesToChange which have original lines and lines after the change. The issue is I have no idea how to use that to change the txt file. I tried few methods and ended up with file which contains updated lines but all others are doubbled (I tried foreach) or PS is using whole RAM and couldn't finish the job :)
My latest idea is to use something like that:
(Get-Content -Path $file) | ForEach-Object {
$line = $_
$LinesToChange.GetEnumerator() | ForEach-Object {
if ($line -match "$($LinesToChange.Line_org)") {
$line = $line -replace "$($LinesToChange.Line_org)", "$($LinesToChange.Line_new)"
}
if ($line -match "$($LinesToChange.Line_GTU_org)") {
$line = $line -replace "$($LinesToChange.Line_GTU_org)", "$($LinesToChange.Line_GTU_new)"
}
}
} | Set-Content -Path Somehere\newfile.txt
It seemed promising at first, but the variable $line contains all lines and as such it can't find the match.
Also I need to be sure that the second line will be directly below the first one (it is unlikely but it can be a case that there will be two or more lines with the same data while the "number" from CSV file is unique) so preferably while changing the txt file it would be needed to find a match for a two-liner; in short:
find this two lines:
1|1|FP01340/05/20|2020-05-02|2020-05-02|2020-05-02|166,91|203,23|36,32|nothing interesting 18|33333|63-111 somewhere|||||
2|zwol|9,00|9,00|0,00
change them to:
1|1|FP01340/05/20|2020-05-02|2020-05-02|2020-05-02|166,91|203,23|36,32|nothing interesting 18|33333|63-111 somewhere||||||0|1
2|zwol|9,00|9,00|0,00|AAA_01,AAA_03
Do that for all lines in a $LinesToChange
Any help will be much appreciated!
Greetings!
Some strange text file you have there, but anyway, this should do it:
# read in the text file as string array
$txt = Get-Content -Path '<PathToTheTextFile>'
$csv = Import-Csv -Path '<PathToTheCSVFile>' -Delimiter ';'
# loop through the items (rows) in the CSV and find matching lines in the text array
foreach ($item in $csv) {
$match = $txt | Select-String -Pattern ('|{0}|' -f $item.Number) -SimpleMatch
if ($match) {
# update the matching text line (array indices count from 0, so we do -1)
$txt[$match.LineNumber -1] += ('|{0}|{1}' -f $item.A, $item.B)
# update the line following
$txt[$match.LineNumber] += ('|{0}' -f $item.ValueOfB)
}
}
# show updated text on screen
$txt
# save updated text to file
$txt | Set-Content -Path 'Somehere\newfile.txt'

How to seperate CSV values within a CSV into new rows in PowerShell

I'm receiving an automated report from a system that cannot be modified as a CSV. I am using PowerShell to split the CSV into multiple files and parse out the specific data needed. The CSV contains columns that may contain no data, 1 value, or multiple values that are comma separated within the CSV file itself.
Example(UPDATED FOR CLARITY):
"Group","Members"
"Event","362403"
"Risk","324542, 340668, 292196"
"Approval","AA-334454, 344366, 323570, 322827, 360225, 358850, 345935"
"ITS","345935, 358850"
"Services",""
I want the data to have one entry per line like this (UPDATED FOR CLARITY):
"Group","Members"
"Event","362403"
"Risk","324542"
"Risk","340668"
"Risk","292196"
#etc.
I've tried splitting the data and I just get an unknown number of columns at the end.
I tried a foreach loop, but can't seem to get it right (pseudocode below):
Import-CSV $Groups
ForEach ($line in $Groups){
If($_.'Members'.count -gt 1, add-content "$_.Group,$_.Members[2]",)}
I appreciate any help you can provide. I've searched all the stackexchange posts and used Google but haven't been able to find something that addresses this exact issue.
Import-Csv .\input.csv | ForEach-Object {
ForEach ($Member in ($_.Members -Split ',')) {
[PSCustomObject]#{Group = $_.Group; Member = $Member.Trim()}
}
} | Export-Csv .\output.csv -NoTypeInformation
# Get the raw text contents
$CsvContents = Get-Content "\path\to\file.csv"
# Convert it to a table object
$CsvData = ConvertFrom-CSV -InputObject $CsvContents
# Iterate through the records in the table
ForEach ($Record in $CsvData) {
# Create array from the members values at commas & trim whitespace
$Record.Members -Split "," | % {
$MemberCount = $_.Trim()
# Check if the count is greater than 1
if($MemberCount -gt 1) {
# Create our output string
$OutputString = "$($Record.Group), $MemberCount"
# Write our output string to a file
Add-Content -Path "\path\to\output.txt" -Value $OutputString
}
}
}
This should work, you had the right idea but I think you may have been encountering some syntax issues. Let me know if you have questions :)
Revised the code as per your updated question,
$List = Import-Csv "\path\to\input.csv"
foreach ($row in $List) {
$Group = $row.Group
$Members = $row.Members -split ","
# Process for each value in Members
foreach ($MemberValue in $Members) {
# PS v3 and above
$Group + "," + $MemberValue | Export-Csv "\path\to\output.csv" -NoTypeInformation -Append
# PS v2
# $Group + "," + $MemberValue | Out-File "\path\to\output.csv" -Append
}
}

New columns into CSV file incredibly slow

I have a bunch of .csv files and I'm trying to add in some new column headers and their values (which are all blank anyway) and then output this to a new .csv file. My script currently runs and works fine but it takes about 5 minutes to complete the operation on a 60MB file with about 70,000 rows - I have about 100 files to do this on so it will take a while using this script.
My code is below, it's quite simple but clearly inefficient!
Import-Csv $strFilePath |
Select-Object *, #{Name='NewHeader';Expression={''}},
#{Name='NewHeader2';Expression={''}},
#{Name='NewHeader3';Expression={''}},
#{Name='NewHeader4';Expression={''}} |
Export-Csv $($strFilePath + ".new") -NoTypeInformation
As pointed out in the comments, I think it would be better to treat it as a simple text without the useless conversion.
$path = 'C:\test'
$newHeaders = 'NewHeader1','NewHeader2','NewHeader3','NewHeader4'
$files = Get-ChildItem -LiteralPath $path -Filter *.csv
$newHeadersString = #(''; $newHeaders | foreach { '"{0}"' -f $_ }) -join ','
$newColmunsString = ',""' * $newHeaders.Count
foreach ($file in $files) {
$sr = $file.OpenText()
$outfile = New-Item ($file.FullName + '.new') -Force
$sw = [IO.StreamWriter]::new($outfile.FullName)
$sw.WriteLine($sr.ReadLine() + $newHeadersString)
while(!$sr.EndOfStream) { $sw.WriteLine($sr.ReadLine() + $newColmunsString) }
$sr.Close()
$sw.Close()
}

How to modify contents of a pipe-delimited text file with PowerShell

I have a pipe-delimited text file. The file contains "records" of various types. I want to modify certain columns for each record type. For simplicity, let's say there are 3 record types: A, B, and C. A has 3 columns, B has 4 columns, and C has 5 columns. For example, we have:
A|stuff|more_stuff
B|123|other|x
C|something|456|stuff|more_stuff
B|78903|stuff|x
A|1|more_stuff
I want to append the prefix "P" to all desired columns. For A, the desired column is 2. For B, the desired column is 3. For C, the desired column is 4.
So, I want the output to look like:
A|Pstuff|more_stuff
B|123|Pother|x
C|something|456|Pstuff|more_stuff
B|78903|Pstuff|x
A|P1|more_stuff
I need to do this in PowerShell. The file could be very large. So, I'm thinking about going with the File-class of .NET. If it were a simple string replacement, I would do something like:
$content = [System.IO.File]::ReadAllText("H:\test_modify_contents.txt").Replace("replace_text","something_else")
[System.IO.File]::WriteAllText("H:\output_file.txt", $content)
But, it's not so simple in my particular situation. So, I'm not even sure if ReadAllText and WriteAllText is the best solution. Any ideas on how to do this?
I would ConvertFrom-Csv so you can check each line as an object. On this code, I did add a header, but mainly for code readability. The header is cut out of the output on the last line anyway:
$input = "H:\test_modify_contents.txt"
$output = "H:\output_file.txt"
$data = Get-Content -Path $input | ConvertFrom-Csv -Delimiter '|' -Header 'Column1','Column2','Column3','Column4','Column5'
$data | % {
If ($_.Column5) {
#type C:
$_.Column4 = "P$($_.Column4)"
} ElseIf ($_.Column4) {
#type B:
$_.Column3 = "P$($_.Column3)"
} Else {
#type A:
$_.Column2 = "P$($_.Column2)"
}
}
$data | Select Column1,Column2,Column3,Column4,Column5 | ConvertTo-Csv -Delimiter '|' -NoTypeInformation | Select-Object -Skip 1 | Set-Content -Path $output
It does add extra | for the type A and B lines. Output:
"A"|"Pstuff"|"more_stuff"||
"B"|"123"|"Pother"|"x"|
"C"|"something"|"456"|"Pstuff"|"more_stuff"
"B"|"78903"|"Pstuff"|"x"|
"A"|"P1"|"more_stuff"||
If your file sizes are large then reading the complete file contents at once using Import-Csv or ReadAll is probably not a good idea. I would use Get-Content cmdlet using the ReadCount property which will stream the file one row at time and then use a regex for the processing. Something like this:
Get-Content your_in_file.txt -ReadCount 1 | % {
$_ -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'
} | Set-Content your_out_file.txt
EDIT:
This version should output faster:
$d = Get-Date
Get-Content input.txt -ReadCount 1000 | % {
$_ | % {
$_ -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'
} | Add-Content output.txt
}
(New-TimeSpan $d (Get-Date)).Milliseconds
For me this processed 50k rows in 350 milliseconds. You probably get more speed by tweaking the -ReadCount value to find the ideal amount.
Given the large input file, i would not use either ReadAllText or Get-Content.
They actually read the entire file into memory.
Consider using something along the lines of
$filename = ".\input2.csv"
$outfilename = ".\output2.csv"
function ProcessFile($inputfilename, $outputfilename)
{
$reader = [System.IO.File]::OpenText($inputfilename)
$writer = New-Object System.IO.StreamWriter $outputfilename
$record = $reader.ReadLine()
while ($record -ne $null)
{
$writer.WriteLine(($record -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'))
$record = $reader.ReadLine()
}
$reader.Close()
$reader.Dispose()
$writer.Close()
$writer.Dispose()
}
ProcessFile $filename $outfilename
EDIT: After testing all the suggestions on this page, i have borrowed the regex from Dave Sexton and this is the fastest implementation. Processes a 1gb+ file in 175 seconds. All other implementations are significantly slower on large input files.

Parsing multiple entries in multiple CSV files

Hope someone can offer a suggestion to help me speed up a Powershell script. What I am doing is reading in hundreds of CSV files, parsing the information to get data about missing entries, and then writing that output to a HTML file. Here is the loop that I am using to process the files:
ForEach ($Filename in $FileList) {
$CustTemp = import-csv "$FilePath\$Filename"
$CustName = $CustTemp[0].CustName
Write-Host "Reading data for $CustName"`r
For ($counter=0;$counter -lt 31;$counter++){
$CheckDate = (get-date).AddDays(-$counter)
$CheckShortDate = $CheckDate.ToShortDateString()
$TempData = import-csv "$FilePath\$Filename" | Select FileName,FileDate | where {$_.FileDate -eq $CheckShortDate}
If ($TempData -eq $null) {
$row = "No file found for $CheckShortDate for $CustName"
$HTMLReportItems += $row
}
$HTMLReportItems = $HTMLReportItems | ConvertTo-Html -Fragment
}
}
This loop worked fine when I was testing with a few CSV files but when running it against a large number of files (300+) the loop is taking an extremely long time to complete for each file (30s-1m). I'm pretty sure the reason why is that the CSV file is being accessed 30 times per iteration. What I am hoping is that someone will have a better suggestion on how I can process the data.
You're reading $FilePath\$Filename multiple times. Read it outside the for loop and only do the filtering inside. Move the HTML generation outside the loop as well.
$HTMLReportItems = foreach ($Filename in $FileList) {
$csv = Import-Csv (Join-Path $FilePath $Filename)
$CustName = $csv[0].CustName
$data = $csv | select FileName,FileDate
Write-Host "Reading data for $CustName"
for ($counter=0;$counter -lt 31;$counter++){
$CheckShortDate = (Get-Date).AddDays(-$counter).ToShortDateString()
$TempData = $data | ? {$_.FileDate -eq $CheckShortDate}
if ($TempData -eq $null) {
"No file found for $CheckShortDate for $CustName"
}
}
}
$HTMLReportItems = $HTMLReportItems | ConvertTo-Html -Fragment