I'm doing some multiple regEX replacements in powershell on a large number of files and would like to only write the file if any replacements were actually made.
For example if I do:
($_ | Get-Content-Raw) -Replace 'MAKEUPS', 'Makeup' -Replace '_MAKEUP', 'Makeup' -Replace 'Make up', 'Makeup' -Replace 'Make-up', 'Makeup' -Replace '"SELF:/', '"' |
Out-File $_.FullName -encoding ASCII
I only want to write the file if it found anything to replace. Is this possible, maybe with a count or boolean operation?
I did think maybe to check the length of the string before and after but was hoping for a more elegant solution, so I thought I'd ask the experts!
You can take advantage of the fact that PowerShell's -replace operator passes the input string through as-is if no replacements were performed:
# <# some Get-ChildItem command #> ... | ForEach-Object {
# Read the input file in full, as a single string.
$originalContent = $_ | Get-Content -Raw
# *Potentially* perform replacements, depending on whether the search patterns are found.
$potentiallyModifiedContent =
$originalContent -replace 'MAKEUPS', 'Makeup' -replace '_MAKEUP', 'Makeup' -replace 'Make up', 'Makeup' -replace 'Make-up', 'Makeup' -replace '"SELF:/', '"'
# Save, but only if modifications were made.
if (-not [object]::ReferenceEquals($originalContent, $potentiallyModifiedConent)) {
Set-Content -NoNewLine -Encoding Ascii -LiteralPath $_.FullName -Value $potentiallyModifiedConent
}
# }
[object]::ReferenceEquals() tests for reference equality, i.e. whether the two strings represent the exact same string instance, which makes the comparison very efficient (no need to look at the content of the strings).
Set-Content rather than Out-File is used to write the output file, which is preferable for performance reasons with input that is made up of strings already.
-NoNewLine is needed to prevent a trailing newline from getting appended to the output file.
You could use the script block feature added in ps6 to set a variable when a replacement takes place, then return the replacement string.
$replaced = $false
$content = (Get-content -raw $file) -replace "(make-up|makeups|make up|...)", {
# $replaced = $true
Set-Variable replaced $true -Scope 1
return "Makeup"
} -replace "SELF:/", {
Set-Variable replaced $true -Scope 1
# $replaced = $true
return '"'
}
If ($replaced){
Set-content -path $file -value $content
}
In older versions of PowerShell, you might check the content length and if they're the same do a comparison... I wouldn't do a match to see if replacement is needed, that would be a lot more expensive...
$original = (Get-content -raw $file)
$content = ($original) -replace "(make-up|makeups|make up|...)", "Make up"
If (($original.length -ne $content.length) -or ($original -ne $content)) {
Set-content ...
}
Related
I have a script that I am working on that reads in some text files and converts them to .csv and changes some values. I have two different file sources. One is a tab delimited .txt file and the other is a comma separated .txt file. Is there a way to determine which type of delimiter is being used to determine which export function is appropriate?
get-childitem $workingDir -filter *.txt -Recurse| ForEach-Object {
$targetfile = $_.Name
$targetFile = $_.FullName.Substring(0,$_.FullName.Length-4)
$targetFile = $targetfile += ".csv"
if( Get-Content -Delimiter = `t ){
Write-Host "The file is tab-delimited"
Get-Content -path $_.FullName
ForEach-Object {$_ -replace â`tâ,â,â } |
Out-File -filepath $targetFile -Encoding utf8
}
else {
Write-Host "The file is comma-separated"
Get-Content -path $_.FullName |
Out-File -filepath $targetFile -Encoding utf8
}
}
Another approach would be to use Select-String to check for tab character and set delimiter.
if(Get-Content $csvfile -First 1 | Select-String -Pattern "`t")
{
$delim = "`t"
}
else
{
$delim = ','
}
Import-Csv $csvfile -Delimiter $delim
Assuming that the comma-separated files never contain tabs (which would then be data), the most efficient approach is to inspect only the first line of each file for the presence of tab characters, which is most easily done with (Get-Content -First 1 $_.FullName) -match "`t" - see Get-Content and -match, the regular-expression matching operator.
# Determine the arguments to pass to Set-Content - later, via splatting -
# for writing the output file.
$setContentArgs = #{
LiteralPath = $_.BaseName + '.csv'
Encoding = 'utf8'
}
# Check the 1st line for containing a tab.
# (This assumes that the comma-separated files contain not tabs as data.)
if ((Get-Content -First 1 $_.FullName) -match "`t") {
Write-Host "The file is tab-delimited."
# Read line by line, replace tabs with commas, and write with UTF-8 encoding.
Get-Content $_.FullName | ForEach-Object { $_ -replace "`t", ',' } |
Set-Content #setContentArgs
}
else {
Write-Host "The file is comma-separated."
# Just read lines as-is and write with UTF-8 encoding.
Get-Content $_.FullName |
Set-Content #setContentArgs
}
Note the use of the .BaseName property on the input [System.IO.FileInfo], which conveniently reports the file name without its extension, which allows you to simply append the new extension.
Since you're dealing with text (strings) only, Set-Content, which is slightly more efficient, is preferable to Out-File.
For the technique of passing arguments via a hashtable (#{ ... }), see about_Splatting
If the files are smallish (easily fit into memory as a whole (possibly twice) each), you can significantly speed up processing by reading each file as a whole with -Raw and using
-NoNewLine (PSv5+) to write that (possibly modified) string as-is, without appending a trailing newline, to the output file.
Since you're then reading the entire file anyway, you can get away with a single Get-Content call and apply -replace "`t", ',' blindly, given that for comma-separated files this will simply be a (fast) no op.
(Get-Content -Raw $_.FullName) -replace "`t", ',' |
Set-Content ($_.BaseName + '.csv') -Encoding Utf8 -NoNewLine
I will use Import-Csv for this:
If(Import-Csv "File path to test if Tab-delimited file" -Delimiter "`t" -Ea SilentlyContinue){
"File is tab-delimited"
}
If(Import-Csv "File path to test if Comma-CSV file" -Ea SilentlyContinue){
"File is a comma-separated CSV"
}
I would like remove all quotations character in my exported csv file, it's very annoying when i generated a new csv file and i need to manually to remove all the quotations that include in the string. Could anyone provide me a Powershell script to overcome this problem? Thanks.
$File = "c:\programfiles\programx\file.csv"
(Get-Content $File) | Foreach-Object {
$_ -replace """, ""
} | Set-Content $File
Next time you make one, export-csv in powershell 7 has a new option you may like:
export-csv -UseQuotes AsNeeded
It seems many of us have already explained that quotes are sometimes needed in CSV files. This is the case when:
the value contains a double quote
the value contains the delimiter character
the value contains newlines or has whitespace at the beginning or the end of the string
With PS version 7 you have the option to use parameter -UseQuotes AsNeeded.
For older versions I made this helper function to convert to CSV using only quotes when needed:
function ConvertTo-CsvNoQuotes {
# returns a csv delimited string array with values unquoted unless needed
[OutputType('System.Object[]')]
[CmdletBinding(DefaultParameterSetName = 'ByDelimiter')]
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true, Position = 0)]
[PSObject]$InputObject,
[Parameter(Position = 1, ParameterSetName = 'ByDelimiter')]
[char]$Delimiter = ',',
[Parameter(ParameterSetName = 'ByCulture')]
[switch]$UseCulture,
[switch]$NoHeaders,
[switch]$IncludeTypeInformation # by default, this function does NOT include type information
)
begin {
if ($UseCulture) { $Delimiter = (Get-Culture).TextInfo.ListSeparator }
# regex to test if a string contains a double quote, the delimiter character,
# newlines or has whitespace at the beginning or the end of the string.
# if that is the case, the value needs to be quoted.
$needQuotes = '^\s|["{0}\r\n]|\s$' -f [regex]::Escape($Delimiter)
# a boolean to check if we have output the headers or not from the object(s)
# and another to check if we have output type information or not
$doneHeaders = $doneTypeInfo = $false
}
process {
foreach($item in $InputObject) {
if (!$doneTypeInfo -and $IncludeTypeInformation) {
'#TYPE {0}' -f $item.GetType().FullName
$doneTypeInfo = $true
}
if (!$doneHeaders -and !$NoHeaders) {
$row = $item.PsObject.Properties | ForEach-Object {
# if needed, wrap the value in quotes and double any quotes inside
if ($_.Name -match $needQuotes) { '"{0}"' -f ($_.Name -replace '"', '""') } else { $_.Name }
}
$row -join $Delimiter
$doneHeaders = $true
}
$item | ForEach-Object {
$row = $_.PsObject.Properties | ForEach-Object {
# if needed, wrap the value in quotes and double any quotes inside
if ($_.Value -match $needQuotes) { '"{0}"' -f ($_.Value -replace '"', '""') } else { $_.Value }
}
$row -join $Delimiter
}
}
}
}
Using your example to remove the unnecessary quotes in an existing CSV file:
$File = "c:\programfiles\programx\file.csv"
(Import-Csv $File) | ConvertTo-CsvNoQuotes | Set-Content $File
keeping in mind that this may trash your data if you have embedded double quotes in your data, here is yet another variation on the idea ... [grin]
what it does ...
defines the input & output full file names
grabs the *.tmp files from the temp dir
filters for the 1st three files & only three basic properties
creates the file to work with
loads the file content
replaces the double quotes with nothing
saves the cleaned file to the 2nd file name
displays the original & cleaned versions of the file
the code ...
$TestCSV = "$env:TEMP\Ted.Xiong_-_Test.csv"
$CleanedTestCSV = $TestCSV -replace 'Test', 'CleanedTest'
Get-ChildItem -LiteralPath $env:TEMP -Filter '*.tmp' -File |
Select-Object -Property Name, LastWriteTime, Length -First 3 |
Export-Csv -LiteralPath $TestCSV -NoTypeInformation
(Get-Content -LiteralPath $TestCSV) -replace '"', '' |
Set-Content -LiteralPath $CleanedTestCSV
Get-Content -LiteralPath $TestCSV
'=' * 30
Get-Content -LiteralPath $CleanedTestCSV
output ...
"Name","LastWriteTime","Length"
"hd4130E.tmp","2020-03-13 5:23:06 PM","0"
"hd418D4.tmp","2020-03-12 11:47:59 PM","0"
"hd41F7D.tmp","2020-03-13 5:23:09 PM","0"
==============================
Name,LastWriteTime,Length
hd4130E.tmp,2020-03-13 5:23:06 PM,0
hd418D4.tmp,2020-03-12 11:47:59 PM,0
hd41F7D.tmp,2020-03-13 5:23:09 PM,0
As above, the quotations are valid for csv, but to remove them you need to escape the quote mark in the replace operation as is a special character:
$File = "c:\programfiles\programx\file.csv"
(Get-Content $File) | Foreach-Object {
$_ -replace "`"", ""
} | Set-Content $File
Why are you manually in a text editor read Csv files?
You exported them to that format for a reason. To read them, just import them back in and view them on screen and or Read them back in and send the readout to notepad for reading.
Export-Csv -Path D:\temp\book1.csv
Import-Csv -Path D:\temp\book1.csv |
Clip |
Notepad # then press crtl+v, then save the notepad file with a new name.
If you don't want Csv, then don't export as Csv, just output as a flat-file, using Out-File instead.
Update
Since your last comment to me indicated your final use case. CSV into SQL is a very common thing. A quick web search will show you how even provide you with a script. You should also be looking at the PowerShell DBATools module.
How to import data from .csv in SQL Server using PowerShell?
Importing CSV files into a Microsoft SQL DB using PowerShell
ImportingCSVsIntoSQLv1.zip
Four Easy Ways to Import CSV Files to SQL Server with PowerShell
Find-Module -Name '*dba*'
<#
Version Name Repository Description
------- ---- ---------- -----------
1.0.101 dbatools PSGallery The community module that enables SQL Server Pros to automate database development and server administration
...
#>
Update
You mean this...
Get-Content 'D:\temp\book1.csv'
<#
# Results
"Site","Dept"
"Main","aaa,bbb,ccc"
"Branch1","ddd,eee,fff"
"Branch2","ggg,hhh,iii"
#>
Get-ChildItem -Path 'D:\temp' -Filter 'book1.csv' |
ForEach {
$NewFile = New-Item -Path 'D:\Temp' -Name "$($PSItem.BaseName).txt"
Get-Content -Path $PSItem.FullName |
ForEach-Object {
Add-Content -Path $NewFile -Value ($PSItem -replace '"') -WhatIf
}
}
<#
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt".
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt".
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt".
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt"
#>
Get-ChildItem -Path 'D:\temp' -Filter 'book1.csv' |
ForEach {
$NewFile = New-Item -Path 'D:\Temp' -Name "$($PSItem.BaseName).txt"
Get-Content -Path $PSItem.FullName |
ForEach-Object {
Add-Content -Path $NewFile -Value ($PSItem -replace '"')
}
}
Get-Content 'D:\temp\book1.txt'
<#
# Results
Site,Dept
Main,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
#>
Of course, you need to use a wildcard for the csv files and use the -Resurse to get all directories and an error handler to make sure you don't have file name collisions.
One solution for dont remove the double quote into the string quoted :
$delimiter=","
$InputFile="c:\programfiles\programx\file.csv"
$OutputFile="c:\programfiles\programx\resultfile.csv"
#import file in variable (not necessary if your faile is big repeat this import where i use $ContentFile)
$ContentFile=import-csv $InputFile -Delimiter $delimiter -Encoding utf8
#list of property of csv file
$properties=($ContentFile | select -First 1 | Get-Member -MemberType NoteProperty).Name
#write header into new file
$properties -join $delimiter | Out-File $OutputFile -Encoding utf8
#write data into new file
$ContentFile | %{
$RowObject=$_ #==> get row object
$Line=#() #==> create array
$properties | %{$Line+=$RowObject."$_"} #==> Loop on every property, take value (without quote) inot row object
$Line -join $delimiter #==> join array for get line with delimer and send to standard outut
} | Out-File $OutputFile -Encoding utf8 -Append #==> export result to output file
An extra double quote can be used to escape a double quote in a string:
$File = "c:\programfiles\programx\file.csv"
(Get-Content $File) | Foreach-Object { $_ -replace """", "" } | Set-Content $File
After you have exported the CSV file with Export-CSV, you can use Get-Content to load the CSV file into an array of strings, then use Set-Content and replace to remove the quotation marks:
Set-Content -Path sample.csv -Value ((Get-Content -Path sample.csv) -replace '"')
As mklement0 helpfully pointed out, this could potentially corrupt the CSV if some lines need quoting. This solution simply goes through the whole file and replaces every quote with ''.
You could also speed this up with using the -Raw switch with Get-Content, which returns a whole string with the newlines preserved, instead of an array of newline delimited strings:
Set-Content -NoNewline -Path sample.csv -Value ((Get-Content -Raw -Path sample.csv) -replace '"')
I am new programming, I am trying to improve the following code of a script, I was thinking of making a function to improve it but I do not know where to start or if it is the best option.
$replaceText01 = (Get-Content -path $copyFileLocation -Raw) -replace '"INSTANCENAME="TEST""',$NUEVAINESTANCIA
Set-Content $copyFileLocation $replaceText01
$replaceText02 = (Get-Content -path $copyFileLocation -Raw) -replace '"INSTANCEID="TEST""',$NUEVAINESTANCIAID
Set-Content $copyFileLocation $replaceText02
$replaceText03 = (Get-Content -path $copyFileLocation -Raw) -replace "NT Service\SQLAgent#TEST", $CUENTAAGTN
Set-Content $copyFileLocation $replaceText03
$replaceText04 = (Get-Content -path $copyFileLocation -Raw) -replace "NT Service\MSSQL#TEST", $CUENTASQLSER
Set-Content $copyFileLocation $replaceText04
$user = "$env:UserDomain\$env:USERNAME"
write-host $user
$replaceText = (Get-Content -path $copyFileLocation -Raw) -replace "##MyUser##", $user
Set-Content $copyFileLocation $replaceText
First off, I would probably try to read the file only once. Then since you are doing many similar operations, I would put all the data about the operations in an array, and then iterate over those data.
In this code, I first read the file. Then I define all the strings that should be replaced together with the strings that should replace them. Then I use a loop to iterate over the data so that we don't repeat the same code all the time.
$data = Get-Content -Path $copyFileLocation -Raw
$replacements = #(
#('"INSTANCENAME="TEST""', $NUEVAINESTANCIA),
#('"INSTANCEID="TEST""', $NUEVAINESTANCIAID),
#("NT Service\SQLAgent#TEST", $CUENTAAGTN),
#("NT Service\MSSQL#TEST", $CUENTASQLSER),
#("##MyUser##", "$env:UserDomain\$env:USERNAME")
)
$replacements | ForEach-Object {
$data = $data.Replace($_[0], $_[1])
}
Set-Content -Path $copyFileLocation -Value $data
It's also possible to get this even shorter if you use the pipeline instead of assigning the data to a variable
$data = Get-Content -Path $copyFileLocation -Raw
#(
#('"INSTANCENAME="TEST""', $NUEVAINESTANCIA),
#('"INSTANCEID="TEST""', $NUEVAINESTANCIAID),
#("NT Service\SQLAgent#TEST", $CUENTAAGTN),
#("NT Service\MSSQL#TEST", $CUENTASQLSER),
#("##MyUser##", "$env:UserDomain\$env:USERNAME")
) | ForEach-Object {
$data = $data.Replace($_[0], $_[1])
}
Set-Content -Path $copyFileLocation -Value $data
Edit: Missed that you were asking on how to make it into a function.
By looking at what your doing I assume you are modifying an SQL unattended install file, and have named it as such.
A good idea here is to make most of the parameters mandatory, so you are sure the user at least specifies all the required parameters. Maybe you want to specify MyUser later, so this is a good candidate for being a parameter with a default value.
Function Set-SQLInstallFileVariables {
Param(
[Parameter(Mandatory)][string]$FilePath,
[Parameter(Mandatory)][string]$NUEVAINESTANCIA,
[Parameter(Mandatory)][string]$NUEVAINESTANCIAID,
[Parameter(Mandatory)][string]$CUENTAAGTN,
[Parameter(Mandatory)][string]$CUENTASQLSER,
[string]$MyUser = "$env:UserDomain\$env:USERNAME"
)
$data = Get-Content -Path $FilePath -Raw
#(
#('"INSTANCENAME="TEST""', $NUEVAINESTANCIA),
#('"INSTANCEID="TEST""', $NUEVAINESTANCIAID),
#("NT Service\SQLAgent#TEST", $CUENTAAGTN),
#("NT Service\MSSQL#TEST", $CUENTASQLSER),
#("##MyUser##", $MyUser)
) | ForEach-Object {
$data = $data.Replace($_[0], $_[1])
}
Set-Content -Path $copyFileLocation -Value $data
}
The first thing to notice is that you read and write the file over and over again on every replacement, which is not very efficient.
This is not needed; once read the content is in a string variable and can get manipulated in memory multiple times before writing back to file.
One approach is to use string arrays that hold the search and replacement strings.
For this to work properly, both arrays must have the same number of elements.
$inputFile = 'D:\Test\TheFile.txt' # your input file path here ($copyFileLocation)
$outputFile = 'D:\Test\TheReplacedFile.txt' # for safety create a new file instead of overwriting the original
$searchStrings = '"INSTANCENAME="TEST""','"INSTANCEID="TEST""',"NT Service\SQLAgent#TEST","NT Service\MSSQL#TEST","##MyUser##"
$replaceStrings = $NUEVAINESTANCIA, $NUEVAINESTANCIAID, $CUENTAAGTN, $CUENTASQLSER, "$env:UserDomain\$env:USERNAME"
# get the current content of the file
$content = Get-Content -path $copyFileLocation -Raw
# loop over the search and replace strings to do all replacements
for ($i = 0; $i -lt $searchStrings.Count; $i++) {
$content = $content -replace [regex]::Escape($searchStrings[$i]), $replaceString[$i]
}
# finally, write the updated content to a (new) file
$content | Set-Content -Path $copyFileLocation
Another approach would be to use a Hashtable that stores both the search strings and the replacement strings:
$inputFile = 'D:\Test\TheFile.txt' # your input file path here ($copyFileLocation)
$outputFile = 'D:\Test\TheReplacedFile.txt' # for safety create a new file instead of overwriting the original
$hash = #{
'"INSTANCENAME="TEST""' = $NUEVAINESTANCIA
'"INSTANCEID="TEST""' = $NUEVAINESTANCIAID
"NT Service\SQLAgent#TEST" = $CUENTAAGTN
"NT Service\MSSQL#TEST" = $CUENTASQLSER
"##MyUser##" = "$env:UserDomain\$env:USERNAME"
}
# get the current content of the file
$content = Get-Content -path $copyFileLocation -Raw
# loop over the items in the hashtable to do all replacements
$hash.GetEnumerator() | ForEach-Object {
# the `$_` is an automatic variable you get within a ForEach-Object{}
# It represents a single item on each iteration.
$content = $content -replace [regex]::Escape($_.Key), $_.Value
}
# finally, write the updated content to a (new) file
$content | Set-Content -Path $copyFileLocation
In both cases, we're using -replace, which is a case-insensitive regex replacement. Because your search strings hold characters that have special meaning in regex (# and \) we need to escape these with [regex]::Escape()
Hope that helps
Here is mine attempt at it. Its not meant to be lean. Rather to be really clear about what it is doing while being full of functions (too many) and stopping you from getting the content multiple times.
After seeing what everyone else answered, their's is better code. Hopefully mine is readable and give you a better idea about using the pipeline and functions :)
Function Replace1 {Process{$_ -replace '"INSTANCENAME="TEST""',$NUEVAINESTANCIA}}
Function Replace2 {Process{$_ -replace '"INSTANCEID="TEST""',$NUEVAINESTANCIAID}}
Function Replace3 {Process{$_ -replace "NT Service\SQLAgent#TEST", $CUENTAAGTN}}
Function Replace4 {Process{$_ -replace "NT Service\MSSQL#TEST", $CUENTASQLSER}}
Function Replace6 {Process{$_ -replace "##MyUser##", $user}}
$user = "$env:UserDomain\$env:USERNAME"
Write-Host $user
Get-Content -path $copyFileLocation | Replace1 | Replace2 | Replace3 | Replace4 | Replace5 | Replace6 | Set-Content -path $copyFileLocation
How can i read every csv file the specific folder? When script below is executed, it only will remove quote character of one csv file.
$file="C:\test\IV-1-2020-04-02.csv"
(GC $file) | % {$_ -replace '"', ''} > $file
Get-ChildItem -Path C:\test\ -Filter '*.csv'
The output only will remove the quote character of "IV-1-2020-04-02.csv". What if i have different filename ?
You can iterate each .csv file from Get-ChildItem and replace the quotes " with '' using Set-Content.
$files = Get-ChildItem -Path "YOUR_FOLDER_PATH" -Filter *.csv
foreach ($file in $files)
{
Set-Content -Path $file.FullName -Value ((Get-Content -Path $file.FullName -Raw) -replace '"', '')
}
Make sure to pass your folder path to -Path, which tells Get-ChildItem to fetch every file from this folder
Its also faster to use the -Raw switch for Get-Content, since it reads the file into one string and preserves newlines. If you omit this switch, Get-Content will by default split the lines by newlines into an array of strings
If you want to read files in deeper sub directories as well, then add the -Recurse switch to Get-ChildItem:
$files = Get-ChildItem -Path "YOUR_FOLDER_PATH" -Filter *.csv -Recurse
Addtionally, you could also use Foreach-Object here:
Get-ChildItem -Path "YOUR_FOLDER_PATH" -Filter *.csv -Recurse | ForEach-Object {
Set-Content -Path $_.FullName -Value ((Get-Content -Path $_.FullName -Raw) -replace '"', '')
}
Furthermore, you could replace Foreach-Object with its alias %. However, If your using VSCode and have PSScriptAnalyzer enabled, you may get this warning:
'%' is an alias of 'ForEach-Object'. Alias can introduce possible problems and make scripts hard to maintain. Please consider changing alias to its full content.
Which warns against using aliases for maintainability. Its much safer and more portable to use the full version. I only use the aliases for quick command line usage, but when writing scripts I use the full versions.
Note: The above solutions could potentially corrupt the CSV if some lines need quoting. This solution simply goes through the whole file and replaces every quote with ''. PowerShell 7 offers a -UseQuotes AsNeeded option for Export-Csv, so you may look into that instead.
Don't just replace all the " unless you are very certain that it's a good idea; otherwise replace the " when it shouldn't matter because the field doesn't contain text with a comma, double quote, nor line break. (see RFC-4180 section 2, #6 and #7)
As with any script that overwrites its working files, make sure you have backups of those files should you want an undo option later on...
$tog = $true
$sep = ':_:'
$header=#()
filter asString{
$obj=$_
if($tog){
$header=(gm -InputObject $obj -Type NoteProperty).Name
$hc = $header.Count-1
$tog=$false
$str = $header -join $sep
$str = "$sep$str" -replace '"','""'
$str = $str -replace "$sep(((?!$sep)[\s\S])*(,|""|\n)((?!$sep)[\s\S])*)",($sep+'"$1"')
($str -replace $sep,',').Substring(1)
}
$str = (0..$hc | %{$obj.($header[$_])}) -join $sep
$str = "$sep$str" -replace '"','""'
$str = $str -replace "$sep(((?!$sep)[\s\S])*(,|""|\n)((?!$sep)[\s\S])*)",($sep+'"$1"')
($str -replace $sep,',').Substring(1)
}
ls *.csv | %{$tog=$true;import-csv $_ | asString | sc "$_.new";$_.FullName} | %{if(test-path "$_.new"){mv "$_.new" $_ -force}}
Note: the CSV files are expected to contain their own headers. You could work around that if you needed to with the use of the -Header option of Import-Csv
I would like remove all quotations character in my exported csv file, it's very annoying when i generated a new csv file and i need to manually to remove all the quotations that include in the string. Could anyone provide me a Powershell script to overcome this problem? Thanks.
$File = "c:\programfiles\programx\file.csv"
(Get-Content $File) | Foreach-Object {
$_ -replace """, ""
} | Set-Content $File
Next time you make one, export-csv in powershell 7 has a new option you may like:
export-csv -UseQuotes AsNeeded
It seems many of us have already explained that quotes are sometimes needed in CSV files. This is the case when:
the value contains a double quote
the value contains the delimiter character
the value contains newlines or has whitespace at the beginning or the end of the string
With PS version 7 you have the option to use parameter -UseQuotes AsNeeded.
For older versions I made this helper function to convert to CSV using only quotes when needed:
function ConvertTo-CsvNoQuotes {
# returns a csv delimited string array with values unquoted unless needed
[OutputType('System.Object[]')]
[CmdletBinding(DefaultParameterSetName = 'ByDelimiter')]
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true, Position = 0)]
[PSObject]$InputObject,
[Parameter(Position = 1, ParameterSetName = 'ByDelimiter')]
[char]$Delimiter = ',',
[Parameter(ParameterSetName = 'ByCulture')]
[switch]$UseCulture,
[switch]$NoHeaders,
[switch]$IncludeTypeInformation # by default, this function does NOT include type information
)
begin {
if ($UseCulture) { $Delimiter = (Get-Culture).TextInfo.ListSeparator }
# regex to test if a string contains a double quote, the delimiter character,
# newlines or has whitespace at the beginning or the end of the string.
# if that is the case, the value needs to be quoted.
$needQuotes = '^\s|["{0}\r\n]|\s$' -f [regex]::Escape($Delimiter)
# a boolean to check if we have output the headers or not from the object(s)
# and another to check if we have output type information or not
$doneHeaders = $doneTypeInfo = $false
}
process {
foreach($item in $InputObject) {
if (!$doneTypeInfo -and $IncludeTypeInformation) {
'#TYPE {0}' -f $item.GetType().FullName
$doneTypeInfo = $true
}
if (!$doneHeaders -and !$NoHeaders) {
$row = $item.PsObject.Properties | ForEach-Object {
# if needed, wrap the value in quotes and double any quotes inside
if ($_.Name -match $needQuotes) { '"{0}"' -f ($_.Name -replace '"', '""') } else { $_.Name }
}
$row -join $Delimiter
$doneHeaders = $true
}
$item | ForEach-Object {
$row = $_.PsObject.Properties | ForEach-Object {
# if needed, wrap the value in quotes and double any quotes inside
if ($_.Value -match $needQuotes) { '"{0}"' -f ($_.Value -replace '"', '""') } else { $_.Value }
}
$row -join $Delimiter
}
}
}
}
Using your example to remove the unnecessary quotes in an existing CSV file:
$File = "c:\programfiles\programx\file.csv"
(Import-Csv $File) | ConvertTo-CsvNoQuotes | Set-Content $File
keeping in mind that this may trash your data if you have embedded double quotes in your data, here is yet another variation on the idea ... [grin]
what it does ...
defines the input & output full file names
grabs the *.tmp files from the temp dir
filters for the 1st three files & only three basic properties
creates the file to work with
loads the file content
replaces the double quotes with nothing
saves the cleaned file to the 2nd file name
displays the original & cleaned versions of the file
the code ...
$TestCSV = "$env:TEMP\Ted.Xiong_-_Test.csv"
$CleanedTestCSV = $TestCSV -replace 'Test', 'CleanedTest'
Get-ChildItem -LiteralPath $env:TEMP -Filter '*.tmp' -File |
Select-Object -Property Name, LastWriteTime, Length -First 3 |
Export-Csv -LiteralPath $TestCSV -NoTypeInformation
(Get-Content -LiteralPath $TestCSV) -replace '"', '' |
Set-Content -LiteralPath $CleanedTestCSV
Get-Content -LiteralPath $TestCSV
'=' * 30
Get-Content -LiteralPath $CleanedTestCSV
output ...
"Name","LastWriteTime","Length"
"hd4130E.tmp","2020-03-13 5:23:06 PM","0"
"hd418D4.tmp","2020-03-12 11:47:59 PM","0"
"hd41F7D.tmp","2020-03-13 5:23:09 PM","0"
==============================
Name,LastWriteTime,Length
hd4130E.tmp,2020-03-13 5:23:06 PM,0
hd418D4.tmp,2020-03-12 11:47:59 PM,0
hd41F7D.tmp,2020-03-13 5:23:09 PM,0
As above, the quotations are valid for csv, but to remove them you need to escape the quote mark in the replace operation as is a special character:
$File = "c:\programfiles\programx\file.csv"
(Get-Content $File) | Foreach-Object {
$_ -replace "`"", ""
} | Set-Content $File
Why are you manually in a text editor read Csv files?
You exported them to that format for a reason. To read them, just import them back in and view them on screen and or Read them back in and send the readout to notepad for reading.
Export-Csv -Path D:\temp\book1.csv
Import-Csv -Path D:\temp\book1.csv |
Clip |
Notepad # then press crtl+v, then save the notepad file with a new name.
If you don't want Csv, then don't export as Csv, just output as a flat-file, using Out-File instead.
Update
Since your last comment to me indicated your final use case. CSV into SQL is a very common thing. A quick web search will show you how even provide you with a script. You should also be looking at the PowerShell DBATools module.
How to import data from .csv in SQL Server using PowerShell?
Importing CSV files into a Microsoft SQL DB using PowerShell
ImportingCSVsIntoSQLv1.zip
Four Easy Ways to Import CSV Files to SQL Server with PowerShell
Find-Module -Name '*dba*'
<#
Version Name Repository Description
------- ---- ---------- -----------
1.0.101 dbatools PSGallery The community module that enables SQL Server Pros to automate database development and server administration
...
#>
Update
You mean this...
Get-Content 'D:\temp\book1.csv'
<#
# Results
"Site","Dept"
"Main","aaa,bbb,ccc"
"Branch1","ddd,eee,fff"
"Branch2","ggg,hhh,iii"
#>
Get-ChildItem -Path 'D:\temp' -Filter 'book1.csv' |
ForEach {
$NewFile = New-Item -Path 'D:\Temp' -Name "$($PSItem.BaseName).txt"
Get-Content -Path $PSItem.FullName |
ForEach-Object {
Add-Content -Path $NewFile -Value ($PSItem -replace '"') -WhatIf
}
}
<#
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt".
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt".
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt".
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt"
#>
Get-ChildItem -Path 'D:\temp' -Filter 'book1.csv' |
ForEach {
$NewFile = New-Item -Path 'D:\Temp' -Name "$($PSItem.BaseName).txt"
Get-Content -Path $PSItem.FullName |
ForEach-Object {
Add-Content -Path $NewFile -Value ($PSItem -replace '"')
}
}
Get-Content 'D:\temp\book1.txt'
<#
# Results
Site,Dept
Main,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
#>
Of course, you need to use a wildcard for the csv files and use the -Resurse to get all directories and an error handler to make sure you don't have file name collisions.
One solution for dont remove the double quote into the string quoted :
$delimiter=","
$InputFile="c:\programfiles\programx\file.csv"
$OutputFile="c:\programfiles\programx\resultfile.csv"
#import file in variable (not necessary if your faile is big repeat this import where i use $ContentFile)
$ContentFile=import-csv $InputFile -Delimiter $delimiter -Encoding utf8
#list of property of csv file
$properties=($ContentFile | select -First 1 | Get-Member -MemberType NoteProperty).Name
#write header into new file
$properties -join $delimiter | Out-File $OutputFile -Encoding utf8
#write data into new file
$ContentFile | %{
$RowObject=$_ #==> get row object
$Line=#() #==> create array
$properties | %{$Line+=$RowObject."$_"} #==> Loop on every property, take value (without quote) inot row object
$Line -join $delimiter #==> join array for get line with delimer and send to standard outut
} | Out-File $OutputFile -Encoding utf8 -Append #==> export result to output file
An extra double quote can be used to escape a double quote in a string:
$File = "c:\programfiles\programx\file.csv"
(Get-Content $File) | Foreach-Object { $_ -replace """", "" } | Set-Content $File
After you have exported the CSV file with Export-CSV, you can use Get-Content to load the CSV file into an array of strings, then use Set-Content and replace to remove the quotation marks:
Set-Content -Path sample.csv -Value ((Get-Content -Path sample.csv) -replace '"')
As mklement0 helpfully pointed out, this could potentially corrupt the CSV if some lines need quoting. This solution simply goes through the whole file and replaces every quote with ''.
You could also speed this up with using the -Raw switch with Get-Content, which returns a whole string with the newlines preserved, instead of an array of newline delimited strings:
Set-Content -NoNewline -Path sample.csv -Value ((Get-Content -Raw -Path sample.csv) -replace '"')