Powershell Import-CSV - compare with text file. speed optimalization? - powershell

I have written a powershell script which compares words from a text-file with a csv-column, and if the word in the column matches, the line is deleted.
My code works great, but it is very slow, due to saving and copying the csv file after every line. Is there a way improving this?
$reader = [System.IO.File]::OpenText($fc_file.Text)
try {
for() {
$line = $reader.ReadLine()
if ($line -eq $null) { break }
if ($line -eq "") { break }
# process the line
$fc_suchfeld = $fc_ComboBox.Text
$tempstorage = $scriptPath + "\temp\temp.csv"
Import-Csv $tempfile -Delimiter $delimeter -Encoding $char | where {$_.$fc_suchfeld -notmatch [regex]::Escape($line)} | Export-Csv $tempstorage -Delimiter $delimeter -Encoding $char -notypeinfo
Remove-Item $tempfile
Rename-Item $tempstorage $tempfile_ext
}
}
finally {
$reader.Close()
}

Related

How to update cells in csv one by one without knowing column names

I try to replace some texts inside a csv like following:
$csv = Import-Csv $csvFileName -Delimiter ';'
foreach ($line in $csv)
{
$properties = $line | Get-Member -MemberType Properties
for ($i = 0; $i -lt $properties.Count;$i++)
{
$column = $properties[$i]
$value = $line | Select -ExpandProperty $column.Name
# Convert numbers with , as separator to . separator
if ($value -match "^[+-]?(\d*\,)?\d+$")
{
$value = $value -replace ",", "."
# HOW TO update the CSV cell with the new value here???
# ???
}
}
}
$csv | Export-Csv $csvFileName -Delimiter ',' -NoTypeInformation -Encoding UTF8
As you can see, I miss the line where I want to update the csv line's cell value with the new value => can someone tell me how I can do that?
Assuming the regex you have in place will match the pattern you expect, you can simplify your code using 2 foreach loops, easier than a for. This method invokes the intrinsic PSObject member, available to all PowerShell objects.
$csv = Import-Csv $csvFileName -Delimiter ';'
foreach ($line in $csv) {
foreach($property in $line.PSObject.Properties) {
if ($property.Value -match "^[+-]?(\d*\,)?\d+$") {
# regex operator not needed in this case
$property.Value = $property.Value.Replace(",", ".")
}
}
}
$csv | Export-Csv ....
You can also do all process in pipeline (method above should be clearly faster, however this method is likely to be memory friendlier):
Import-Csv $csvFileName -Delimiter ';' | ForEach-Object {
foreach($property in $_.PSObject.Properties) {
if ($property.Value -match "^[+-]?(\d*\,)?\d+$") {
# regex operator not needed in this case
$property.Value = $property.Value.Replace(",", ".")
}
}
$_ # => output this object
} | Export-Csv myexport.csv -NoTypeInformation

Memory exception while filtering large CSV files

getting memory exception while running this code. Is there a way to filter one file at a time and write output and append after processing each file. Seems the below code loads everything to memory.
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Get-ChildItem $inputFolder -File -Filter '*.csv' |
ForEach-Object { Import-Csv $_.FullName } |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
May be can you export and filter your files one by one and append result into your output file like this :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Remove-Item $outputFile -Force -ErrorAction SilentlyContinue
Get-ChildItem $inputFolder -Filter "*.csv" -file | %{import-csv $_.FullName | where machine_type -eq 'workstations' | export-csv $outputFile -Append -notype }
Note: The reason for not using Get-ChildItem ... | Import-Csv ... - i.e., for not directly piping Get-ChildItem to Import-Csv and instead having to call Import-Csv from the script block ({ ... } of an auxiliary ForEach-Object call, is a bug in Windows PowerShell that has since been fixed in PowerShell Core - see the bottom section for a more concise workaround.
However, even output from ForEach-Object script blocks should stream to the remaining pipeline commands, so you shouldn't run out of memory - after all, a salient feature of the PowerShell pipeline is object-by-object processing, which keeps memory use constant, irrespective of the size of the (streaming) input collection.
You've since confirmed that avoiding the aux. ForEach-Object call does not solve the problem, so we still don't know what causes your out-of-memory exception.
Update:
This GitHub issue contains clues as to the reason for excessive memory use, especially with many properties that contain small amounts of data.
This GitHub feature request proposes using strongly typed output objects to help the issue.
The following workaround, which uses the switch statement to process the files as text files, may help:
$header = ''
Get-ChildItem $inputFolder -Filter *.csv | ForEach-Object {
$i = 0
switch -Wildcard -File $_.FullName {
'*workstations*' {
# NOTE: If no other columns contain the word `workstations`, you can
# simplify and speed up the command by omitting the `ConvertFrom-Csv` call
# (you can make the wildcard matching more robust with something
# like '*,workstations,*')
if ((ConvertFrom-Csv "$header`n$_").machine_type -ne 'workstations') { continue }
$_ # row whose 'machine_type' column value equals 'workstations'
}
default {
if ($i++ -eq 0) {
if ($header) { continue } # header already written
else { $header = $_; $_ } # header row of 1st file
}
}
}
} | Set-Content $outputFile
Here's a workaround for the bug of not being able to pipe Get-ChildItem output directly to Import-Csv, by passing it as an argument instead:
Import-Csv -LiteralPath (Get-ChildItem $inputFolder -File -Filter *.csv) |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Note that in PowerShell Core you could more naturally write:
Get-ChildItem $inputFolder -File -Filter *.csv | Import-Csv |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Solution 2 :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8 # modify encoding if necessary
$Delimiter=','
#find header for your files => i take first row of first file with data
$Header = Get-ChildItem -Path $inputFolder -Filter *.csv | Where length -gt 0 | select -First 1 | Get-Content -TotalCount 1
#if not header founded then not file with sise >0 => we quit
if(! $Header) {return}
#create array for header
$HeaderArray=$Header -split $Delimiter -replace '"', ''
#open output file
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
#write header founded
$w.WriteLine($Header)
#loop on file csv
Get-ChildItem $inputFolder -File -Filter "*.csv" | %{
#open file for read
$r = New-Object System.IO.StreamReader($_.fullname, $encoding)
$skiprow = $true
while ($line = $r.ReadLine())
{
#exclude header
if ($skiprow)
{
$skiprow = $false
continue
}
#Get objet for current row with header founded
$Object=$line | ConvertFrom-Csv -Header $HeaderArray -Delimiter $Delimiter
#write in output file for your condition asked
if ($Object.machine_type -eq 'workstations') { $w.WriteLine($line) }
}
$r.Close()
$r.Dispose()
}
$w.close()
$w.Dispose()
You have to read and write to the .csv files one row at a time, using StreamReader and StreamWriter:
$filepath = "C:\Change\2019\October"
$outputfile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8
$files = Get-ChildItem -Path $filePath -Filter *.csv |
Where-Object { $_.machine_type -eq 'workstations' }
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
$skiprow = $false
foreach ($file in $files)
{
$r = New-Object System.IO.StreamReader($file.fullname, $encoding)
while (($line = $r.ReadLine()) -ne $null)
{
if (!$skiprow)
{
$w.WriteLine($line)
}
$skiprow = $false
}
$r.Close()
$r.Dispose()
$skiprow = $true
}
$w.close()
$w.Dispose()
get-content *.csv | add-content combined.csv
Make sure combined.csv doesn't exist when you run this, or it's going to go full Ouroboros.

Load files using a table of contents in Powershell

I'm using the following code to load SQL scripts from a folder and execute them.
foreach ($sqlScript in Get-ChildItem -path "$pathToScripts" -Filter *.sql | sort-object) {
Write-Host "Running Script " $sqlScript.Name
#Execute the query
switch ($removeComments) {
$true {
(Get-Content $sqlScript.FullName -Encoding UTF8 | Out-String) -replace '(?s)/\*.*?\*/', " " -split '\r?\ngo\r?\n' -notmatch '^\s*$' |
ForEach-Object { $SqlCmd.CommandText = $_.Trim(); $reader = $SqlCmd.ExecuteNonQuery() }
}
$false {
(Get-Content $sqlScript.FullName -Encoding UTF8 | Out-String) -split '\r?\ngo\r?\n' |
ForEach-Object { $SqlCmd.CommandText = $_.Trim(); $reader = $SqlCmd.ExecuteNonQuery() }
}
}
}
I've been asked if its possible to have some sort of table of contents to execute these files in a particular sequence without having to rename them. Is it possible to have a comma delimited file that I could loop through and load each file in the same sequence?
Edit
This is the code I think I'm going to go with:
Get-Content $executionOrder
ForEach ($file in $executionOrder) {
$sqlScript = $pathToScripts + "\" + $file
Write-Host "Running Script " $sqlScript.Name
#Execute the query
switch ($removeComments) {
$true {
(Get-Content $sqlScript -Encoding UTF8 | Out-String) -replace '(?s)/\*.*?\*/', " " -split '\r?\ngo\r?\n' -notmatch '^\s*$' |
ForEach-Object { $SqlCmd.CommandText = $_.Trim(); $reader = $SqlCmd.ExecuteNonQuery() }
}
$false {
(Get-Content $sqlScript -Encoding UTF8 | Out-String) -split '\r?\ngo\r?\n' |
ForEach-Object { $SqlCmd.CommandText = $_.Trim(); $reader = $SqlCmd.ExecuteNonQuery() }
}
}
}
Is it possible to have a comma delimited file that I could loop through and load each file in the same sequence
Yes. You just need to update your outer look logic to account for that input. With only minor changes you can get what you want.
foreach ($sqlScript in (Import-CSV $pathtoCSV)){
# Process file.
}
That would work if you wanted a CSV file input as you requested. In comments it looks like you are getting a static list of file names in a predefined directory.
$pathToFileList = "C:\Bagel.txt"
$rootScriptDirectory = "\\path\to\scripts"
$removeComments = $true
Get-Content $pathToFileList | ForEach-Object{
# Build the full file paths
$scriptFilePath = [io.path]::Combine($rootScriptDirectory,$_)
# If this file actually exists then it should be processed
If(Test-Path $scriptFilePath -PathType Leaf){
# Get the file contents
$fileContents = Get-Content $scriptFilePath -Encoding UTF8 | Out-String
# Clean the file contents as required
if($removeComments){
$queries = $fileContents -replace '(?s)/\*.*?\*/', " " -split '\r?\ngo\r?\n' -notmatch '^\s*$'
} else {
$queries = $fileContents -split '\r?\ngo\r?\n'
}
# Execute each query of the file
$queries | ForEach-Object{
$SqlCmd.CommandText = $_.Trim()
$reader = $SqlCmd.ExecuteNonQuery()
}
# Hilarity ensues
} else {
Write-Warning "Could not locate the file '$scriptFilePath'"
}
}
The features of switch are a little wasted here since you only have two states. Move the things that actually get changes into an if block. Get the file list and test that the file exists. Open it and parse the queries from it with your already set logic.

Windows Powershell to replace a set of characters from a text files in a folder

Working on a code which will replace a set of characters from a text files in a folder. IS there a way where it can do it for all the files in the folder. I am using a Windows 7 OS and Powershell Version 3.Attaching the code which I have. The issue is it creates a new file when I run the code (New_NOV_1995.txt) but it doesn't change any character in the new file as mentioned in the code. Help very much Appreciated.
$lookupTable = #{
'¿' = '|'
'Ù' = '|'
'À' = '|'
'Ú' = '|'
'³' = '|'
'Ä' = '-'
}
$original_file = 'C:\FilePath\NOV_1995.txt'
$destination_file = 'C:\FilePath\NOV_1995_NEW.txt'
Get-Content -Path $original_file | ForEach-Object {
$line = $_
$lookupTable.GetEnumerator() | ForEach-Object {
if ($line -match $_.Key)
{
$line = $line -replace $_.Key, $_.Value
}
}
$line
} | Set-Content -Path $destination_file
In the following example, I'm assuming that H:\Replace_String is a directory. In your code above, you don't have a backslash so it would only select files in the root of H:.
$configFiles = Get-ChildItem -path H:\Replace_String\*.txt
foreach ($file in $configFiles)
{
(Get-Content $file) |
Foreach-Object { $_ -replace "Cat", "New_Cat" } |
Foreach-Object { $_ -replace "Dog", "New_Dog" } |
Set-Content $file
}
The (original) answer proposed by Tony Hinkle needs another loop. The reason for this is that Get-Content produces an array. Each line represents an element of the array.
$configFiles = Get-ChildItem -path 'H:\Replace_String\*.txt'
foreach ($file in $configFiles){
$output = #()
$content = Get-Content $file
foreach ($line in $content) {
$line = $content.Replace("Cat", "New_Cat")
$line = $content.Replace("Dog", "New_Dog")
$output += $line
}
$output | Set-Content -Path $file
}
Edit: I noticed that Tony Hinkle's answer was modified as I posted this. He's sending everything through a pipeline where I'm storing the array in a variable then looping through. The pipeline method is probably more memory efficient. The variable with second loop for each element of the array is more easily modified to do more than just the two replacments.

powershell: replacing string using hash table

I have files which need to be modified according to mapping provided in CSV. I want to read each line of my txt file and depending if specified value exist I want to replace other strings in that line according to my CSV file (mapping). For that purpose I have used HashTable. Here is my ps script:
$file ="path\map.csv"
$mapping = Import-CSV $file -Encoding UTF8 -Delimiter ";"
$table = $mapping | Group-Object -AsHashTable -AsString -Property Name
$original_file = "path\input.txt"
$destination_file = "path\output.txt"
$content = Get-Content $original_file
foreach ($line in $content){
foreach ($e in $table.GetEnumerator()) {
if ($line -like "$($e.Name)") {
$line = $line -replace $e.Values.old_category, $e.Values.new_category
$line = $line -replace $e.Values.old_type, $e.Values.new_type
}
}
}
Set-Content -Path $destination_file -Value $content
My map.csv looks as follows:
Name;new_category;new_type;old_category;old_type
alfa;new_category1;new_type1;old_category1;old_type1
beta;new_category2;new_type2;old_category2;old_type2
gamma;new_category3;new_type3;old_category3;old_type3
And my input.txt content is:
bla bla "bla"
buuu buuu 123456 "test"
"gamma" "old_category3" "old_type3"
alfa
When I run this script it creates exactly the same output as initial file. Can someone tell me why it didn't change the line where "gamma" appears according to my mapping ?
Thanks in advance
Couple of things to change.
Firstly there is no need to change $mapping to a hash, Import-Csv already gives you an object array to work with.
Secondly, if you want to update the elements of $content, you need to use a for loop such that you can directly access modify them. Using a foreach creates a new variable in the pipeline and you were previously modifying it but then never writing it back to $content
Below should work:
$file ="map.csv"
$mapping = Import-CSV $file -Encoding UTF8 -Delimiter ";"
$original_file = "input.txt"
$destination_file = "output.txt"
$content = Get-Content $original_file
for($i=0; $i -lt $content.length; $i++) {
foreach($map in $mapping) {
if ($content[$i] -like "*$($map.Name)*") {
$content[$i] = $content[$i] -replace $map.old_category, $map.new_category
$content[$i] = $content[$i] -replace $map.old_type, $map.new_type
}
}
}
Set-Content -Path $destination_file -Value $content