Add missing comma to a CSV with Powershell - powershell

I have a CSV which I process using powershell where occasionally one or more of the rows will be missing one of the comma delimiters. It will always have 3 columns and the 2nd column is optional.
Ex.
Col1,Col2,Col3
SomeCol1Val,,SomeCol3Val
AnotherCol1Val,AnotherCol3Val
In the above example I need to add another comma to Row #2
I've been able to determine which row needs to be updated and how change the value, but I'm not sure how overwrite that specific row in the file.
$csvFile = Get-Content "C:\MyFile.csv"
foreach($row in $csvFile) {
$cnt = ($row.ToCharArray() -eq ',').count
if ($cnt -eq 1) {
$row = $row -replace ",",",,"
}
}
Thanks

As Doug Maurer points out, all that is missing from your code is to write the updated $row values back to your input file, using the Set-Content cmdlet.
However, I suggest a different, faster approach, using a switch statement with the -File option and a single -replace operation based on a regex.
$csvFile = 'C:\MyFile.csv'
$newContent =
switch -File $csvFile {
default { $_ -replace '^([^,]+),([^,]+)$', '$1,,$2' }
}
Set-Content $csvFile -Value $newContent -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
Note that you may have to use the -Encoding parameter to specify the desired character encoding, which in Windows PowerShell is the active ANSI code page and in PowerShell [Core] v6+ BOM-less UTF-8.
If you wanted to stick with your original approach:
$csvFile = 'C:\MyFile.csv'
$newContent =
foreach ($row in Get-Content $csvFile) {
if (($row.ToCharArray() -eq ',').Count -eq 1) {
$row -replace ',', ',,'
} else {
$row
}
}
Set-Content $csvFile -Value $newContent -WhatIf
Note that both approaches collect all (modified) lines in memory as a whole, so as to speed up the operation and also to allow writing back to the input file.
However, it is possible to stream the output, to a different file - i.e. to write the output file line by line - by enclosing the switch statement in & { ... } and piping that to Set-Content. With your Get-Content approach you'd have to use
Get-Content ... | ForEach-Object { ... } | Set-Content instead.

Related

Remove additional commas in CSV file using Powershell

I have a csv file that I'll like to import to sql but isn't properly formatted. I am not able to format the generated file (excel file) so I'm looking to do this with the CSV file using. I want to remove the extra commas and also replace the department name (,,,,,,) with the correct department as seen in the example below. Thank you in advance.
Example:
Current Format:
Department,,,,,,First Name,,,,Last Name,,,,,,,School Year,Enrolment Status
Psychology ,,,,,,,,,,,,,,,,,,,,,,, (Remove this line)
,,,,,,Jane,,,,Doe,,,,,,,2022,Enrolled
,,,,,,Jeff,,,,Dane,,,,,,,2019,Enrolled
,,,,,,Tate,,,,Anderson,,,,,,,2019,Not Enrolled
,,,,,,Daphne,,,,Miller,,,,,,,2021,Enrolled
,,,,,,Cora,,,,Dame,,,,,,,2022,Enrolled
Computer Science ,,,,,,,,,,,,,,,,,,,,,,, (Remove this line)
,,,,,,Dora,,,,Explorer,,,,,,,2022,Not Enrolled
,,,,,,Peppa,,,,Diggs,,,,,,,2020,Enrolled
,,,,,,Conrad,,,,Strat,,,,,,,2020,Enrolled
,,,,,,Kat,Noir,,,,2019,,,,,,,Enrolled
,,,,,,Lance,,,,Bug,2018,,,,,,,Enrolled
Ideal format:
Department,First Name,Last Name,School Year,Enrolment Status
Psychology ,,,,,,,,,,,,,,,,,,,,,,, (Remove this line)
Psychology,Jane,Doe,2022,Enrolled
Psychology,Jeff,Dane,2019,Enrolled
Psychology,Tate,Anderson,2019,Not Enrolled
Psychology,Daphne,Miller,2021,Enrolled
Psychology,Cora,Dame,2022,Enrolled
Computer Science ,,,,,,,,,,,,,,,,,,,,,,, (Remove this line)
Computer Science,Dora,Explorer,2022,Not Enrolled
Computer Science,Peppa,Diggs,2020,Enrolled
Computer Science,Conrad,Strat,2020,Enrolled
Computer Science,Kat,Noir,2019,Enrolled
Computer Science,Lance,Bug,2018,Enrolled
here you go:
$csvArray = new-object System.Collections.Generic.List[string]
#Import the file
$text = (gc "C:\tmp\testdata.txt") -replace ",{2,}",","
$arrayEnd = $text.count -1
$text[1..$arrayEnd] | %{
If ($_ -notmatch "^(,)"){
$department = $_ -replace ","
}
Else {
$csvArray.add($department + $_)
}
}
$csvArray.Insert(0,$text[0])
$csvArray | set-content 'C:\tmp\my.csv'
Using the Csv cmdlets:
$Csv = #'
Department,,,,,,First Name,,,,Last Name,,,,,,,School Year,Enrolment Status
Psychology ,,,,,,,,,,,,,,,,,,,,,,, (Remove this line)
,,,,,,Jane,,,,Doe,,,,,,,2022,Enrolled
,,,,,,Jeff,,,,Dane,,,,,,,2019,Enrolled
,,,,,,Tate,,,,Anderson,,,,,,,2019,Not Enrolled
,,,,,,Daphne,,,,Miller,,,,,,,2021,Enrolled
,,,,,,Cora,,,,Dame,,,,,,,2022,Enrolled
Computer Science ,,,,,,,,,,,,,,,,,,,,,,, (Remove this line)
,,,,,,Dora,,,,Explorer,,,,,,,2022,Not Enrolled
,,,,,,Peppa,,,,Diggs,,,,,,,2020,Enrolled
,,,,,,Conrad,,,,Strat,,,,,,,2020,Enrolled
,,,,,,Kat,Noir,,,,2019,,,,,,,Enrolled
,,,,,,Lance,,,,Bug,2018,,,,,,,Enrolled
'#
$List = ConvertFrom-Csv $Csv -Header #(1..20) # |Import-Csv .\Your.Csv -Header #(1..20)
$Columns = $List[0].PSObject.Properties.Where{ $_.Value -and $_.Value -ne 'Department' }.Name
$List |Select-Object -Property $Columns |Where-Object { $_.$($Columns[0]) } |
ConvertTo-Csv -UseQuote Never |Select-Object -Skip 1 # |Set-Content -Encoding utf8 out.csv
First Name,Last Name,School Year,Enrolment Status
Jane,Doe,2022,Enrolled
Jeff,Dane,2019,Enrolled
Tate,Anderson,2019,Not Enrolled
Daphne,Miller,2021,Enrolled
Cora,Dame,2022,Enrolled
Dora,Explorer,2022,Not Enrolled
Peppa,Diggs,2020,Enrolled
Conrad,Strat,2020,Enrolled
Kat,,,Enrolled
Lance,Bug,,Enrolled
Use a switch statement:
& {
$first = $true
switch -Wildcard -File in.csv { # Loop over all lines in file in.csv
',*' { # intra-department line
# Prepend the department name, eliminate empty fields and output.
$dept + (($_ -split ',' -ne '') -join ',')
}
default {
if ($first) { # header line
# Eliminate empty fields and output.
($_ -split ',' -ne '') -join ','
$first = $false
}
else { # department-only line
$dept = ($_ -split ',')[0] # save department name
}
}
}
} | Set-Content -Encoding utf8 out.csv
Note:
$_ -split ',' splits each line into fields by ,, and -ne '' filters out empty fields from the resulting array; applying -join ',' rejoins the nonempty fields with ,, which in effect removes multiple adjacent , and thereby eliminates empty fields.
If you don't mind the complexity of a regex, you can perform the above more simply with a single -replace operation, as shown in Toni's helpful answer.
Using switch -File is an efficient way to read files line by line and perform conditional processing based on sophisticated matching (as an alternative to -Wildcard you can use -Regex for regex matching, and you can even use script blocks ({ ... } as conditionals).
As a language statement, switch cannot be used directly in a pipeline.
This limitation can be overcome by enclosing it in a script block ({ ... }) invoked with &, which enables the usual, memory-friendly streaming behavior in the pipeline; that is, the lines are processed one by one, as are the modified output lines relayed to Set-Content, so that the input file needn't be read into memory as a whole.
In your case, plain-text processing of your CSV file enabled a simple solution, but in general it is better to parse CSV files into objects whose properties you can work with, using the Import-Csv cmdlet, and, for later re-exporting to a CSV file, Export-Csv,

How to make changes to file content and save it to another file using powershell?

I want to do this
read the file
go through each line
if the line matches the pattern, do some changes with that line
save the content to another file
For now I use this script:
$file = [System.IO.File]::ReadLines("C:\path\to\some\file1.txt")
$output = "C:\path\to\some\file2.txt"
ForEach ($line in $file) {
if($line -match 'some_regex_expression') {
$line = $line.replace("some","great")
}
Out-File -append -filepath $output -inputobject $line
}
As you can see, here I write line by line. Is it possible to write the whole file at once ?
Good example is provided here :
(Get-Content c:\temp\test.txt) -replace '\[MYID\]', 'MyValue' | Set-Content c:\temp\test.txt
But my problem is that I have additional IF statement...
So, what could I do to improve my script ?
You could do it like that:
Get-Content -Path "C:\path\to\some\file1.txt" | foreach {
if($_ -match 'some_regex_expression') {
$_.replace("some","great")
}
else {
$_
}
} | Out-File -filepath "C:\path\to\some\file2.txt"
Get-Content reads a file line by line (array of strings) by default so you can just pipe it into a foreach loop, process each line within the loop and pipe the whole output into your file2.txt.
In this case Arrays or Array List(lists are better for large arrays) would be the most elegant solution. Simply add strings in array until ForEach loop ends. After that just flush array to a file.
This is Array List example
$file = [System.IO.File]::ReadLines("C:\path\to\some\file1.txt")
$output = "C:\path\to\some\file2.txt"
$outputData = New-Object System.Collections.ArrayList
ForEach ($line in $file) {
if($line -match 'some_regex_expression') {
$line = $line.replace("some","great")
}
$outputData.Add($line)
}
$outputData |Out-File $output
I think the if statement can be avoided in a lot of cases by using regular expression groups (e.g. (.*) and placeholders (e.g. $1, $2 etc.).
As in your example:
(Get-Content .\File1.txt) -Replace 'some(_regex_expression)', 'great$1' | Set-Content .\File2.txt
And for the good example" where [MYID\] might be somewhere inline:
(Get-Content c:\temp\test.txt) -Replace '^(.*)\[MYID\](.*)$', '$1MyValue$2' | Set-Content c:\temp\test.txt
(see also How to replace first and last part of each line with powershell)

Dynamically replacing file content

I have to read one properties file (let's say prop.txt) and update it dynamically.
Content looks like this.
server.names=xyz[500],server2[500],test[500]
I wanted to replace the content anything after server.names= with correct values, e.g.:
server1.company.com[500],server2.company.com[500],server3.company.com[500]
I tried below command but it is replacing server.names=. I want to replace the values of server.names=
(Get-Content $path).Replace("server.names=",$NewServerNames) | Set-Content $path
Any idea how to replace the value of server.names=?
You were close, but your syntax is off. This solution utilizes regex to capture the original key:
$Pattern = 'server\.names='
Get-Content -Path $Path |
ForEach-Object {
If ($_ -match $Pattern)
{
$_ -replace "($Pattern).*","$1$NewServerNames"
}
Else
{
$_
}
} |
Set-Content -Path $Path

Copy specific lines from a text file to separate file using powershell

I am trying to get all the lines from an Input file starting with %% and paste it into Output file using powershell.
Used the following code, however I am only getting last line in Output file starting with %% instead of all the lines starting with %%.
I have only started to learn powershell, please help
$Clause = Get-Content "Input File location"
$Outvalue = $Clause | Foreach {
if ($_ -ilike "*%%*")
{
Set-Content "Output file location" $_
}
}
You are looping over the lines in the file, and setting each one as the whole content of the file, overwriting the previous file each time.
You need to either switch to using Add-Content instead of Set-Content, which will append to the file, or change the design to:
Get-Content "input.txt" | Foreach-Object {
if ($_ -like "%%*")
{
$_ # just putting this on its own, sends it on out of the pipeline
}
} | Set-Content Output.txt
Which you would more typically write as:
Get-Content "input.txt" | Where-Object { $_ -like "%%*" } | Set-Content Output.txt
and in the shell, you might write as
gc input.txt |? {$_ -like "%%*"} | sc output.txt
Where the whole file is filtered, and then all the matching lines are sent into Set-Content in one go, not calling Set-Content individually for each line.
NB. PowerShell is case insensitive by default, so -like and -ilike behave the same.
For a small file, Get-Content is nice. But if you start trying to do this on heavier files, Get-Content will eat your memory and leave you hanging.
Keeping it REALLY simple for other Powershell starters out there, you'll be better covered (and with better performance). So, something likes this would do the job:
$inputfile = "C:\Users\JohnnyC\Desktop\inputfile.txt"
$outputfile = "C:\Users\JohnnyC\Desktop\outputfile.txt"
$reader = [io.file]::OpenText($inputfile)
$writer = [io.file]::CreateText($outputfile)
while($reader.EndOfStream -ne $true) {
$line = $reader.Readline()
if ($line -like '%%*') {
$writer.WriteLine($line);
}
}
$writer.Dispose();
$reader.Dispose();

Powershell Find and replace first line of CSV only

I need to read in a CSV file and find replace certain characters from the first line of the file only. I have used foreach-object however this processes the entire file. Any thought on how this can best be achieved.
Here is the code :
Get-Content c:\output.csv | ForEach-Object { $_ -replace "objectGUID", 'StudentID' } | Set-Content c:\output2.csv
This won't fix the problem of having to process the entire file, but should substantially reduce the time it takes to do it if it's a substantially large file.
$Updated = $false
Get-Content c:\output.csv -ReadCount 1000 |
ForEach-Object {
if ($Updated)
{
$_ | Add-Content c:\output2.csv
}
else {
$_[0] = $_[0] -replace "objectGUID", 'StudentID'
$_ | Set-Content c:\output2.csv
$Updated = $true
}
}
Edit: if it's only 3000 rows this should be sufficient:
$FileContent = Get-Content c:\output.csv
$FileContent[0] = $FileContent[0] -replace 'objectGUID', 'StudentID'
$FileContent | Set-Content c:\output2.csv
Ok, Get-Content makes this simple enough if all you want to do is change the first line of a text file.
GC c:\output.csv|select -first 1|%{$_ -replace "objectGUID", 'StudentID'}|Out-File C:\Output2.csv
GC C:\output.csv -readcount 1000|Select -skip 1|Out-File C:\Output2.csv -Append
That will pull the first line only, replacing the text you wanted and write it to a new file (assuming you don't already have an Output2.csv file). After that it reads in the rest of the file skipping the first line and adds that to the same file. You can delete the original file after that and rename the output file if you feel the need.