Remove additional commas in CSV file using Powershell - powershell

I have a csv file that I'll like to import to sql but isn't properly formatted. I am not able to format the generated file (excel file) so I'm looking to do this with the CSV file using. I want to remove the extra commas and also replace the department name (,,,,,,) with the correct department as seen in the example below. Thank you in advance.
Example:
Current Format:
Department,,,,,,First Name,,,,Last Name,,,,,,,School Year,Enrolment Status
Psychology ,,,,,,,,,,,,,,,,,,,,,,, (Remove this line)
,,,,,,Jane,,,,Doe,,,,,,,2022,Enrolled
,,,,,,Jeff,,,,Dane,,,,,,,2019,Enrolled
,,,,,,Tate,,,,Anderson,,,,,,,2019,Not Enrolled
,,,,,,Daphne,,,,Miller,,,,,,,2021,Enrolled
,,,,,,Cora,,,,Dame,,,,,,,2022,Enrolled
Computer Science ,,,,,,,,,,,,,,,,,,,,,,, (Remove this line)
,,,,,,Dora,,,,Explorer,,,,,,,2022,Not Enrolled
,,,,,,Peppa,,,,Diggs,,,,,,,2020,Enrolled
,,,,,,Conrad,,,,Strat,,,,,,,2020,Enrolled
,,,,,,Kat,Noir,,,,2019,,,,,,,Enrolled
,,,,,,Lance,,,,Bug,2018,,,,,,,Enrolled
Ideal format:
Department,First Name,Last Name,School Year,Enrolment Status
Psychology ,,,,,,,,,,,,,,,,,,,,,,, (Remove this line)
Psychology,Jane,Doe,2022,Enrolled
Psychology,Jeff,Dane,2019,Enrolled
Psychology,Tate,Anderson,2019,Not Enrolled
Psychology,Daphne,Miller,2021,Enrolled
Psychology,Cora,Dame,2022,Enrolled
Computer Science ,,,,,,,,,,,,,,,,,,,,,,, (Remove this line)
Computer Science,Dora,Explorer,2022,Not Enrolled
Computer Science,Peppa,Diggs,2020,Enrolled
Computer Science,Conrad,Strat,2020,Enrolled
Computer Science,Kat,Noir,2019,Enrolled
Computer Science,Lance,Bug,2018,Enrolled

here you go:
$csvArray = new-object System.Collections.Generic.List[string]
#Import the file
$text = (gc "C:\tmp\testdata.txt") -replace ",{2,}",","
$arrayEnd = $text.count -1
$text[1..$arrayEnd] | %{
If ($_ -notmatch "^(,)"){
$department = $_ -replace ","
}
Else {
$csvArray.add($department + $_)
}
}
$csvArray.Insert(0,$text[0])
$csvArray | set-content 'C:\tmp\my.csv'

Using the Csv cmdlets:
$Csv = #'
Department,,,,,,First Name,,,,Last Name,,,,,,,School Year,Enrolment Status
Psychology ,,,,,,,,,,,,,,,,,,,,,,, (Remove this line)
,,,,,,Jane,,,,Doe,,,,,,,2022,Enrolled
,,,,,,Jeff,,,,Dane,,,,,,,2019,Enrolled
,,,,,,Tate,,,,Anderson,,,,,,,2019,Not Enrolled
,,,,,,Daphne,,,,Miller,,,,,,,2021,Enrolled
,,,,,,Cora,,,,Dame,,,,,,,2022,Enrolled
Computer Science ,,,,,,,,,,,,,,,,,,,,,,, (Remove this line)
,,,,,,Dora,,,,Explorer,,,,,,,2022,Not Enrolled
,,,,,,Peppa,,,,Diggs,,,,,,,2020,Enrolled
,,,,,,Conrad,,,,Strat,,,,,,,2020,Enrolled
,,,,,,Kat,Noir,,,,2019,,,,,,,Enrolled
,,,,,,Lance,,,,Bug,2018,,,,,,,Enrolled
'#
$List = ConvertFrom-Csv $Csv -Header #(1..20) # |Import-Csv .\Your.Csv -Header #(1..20)
$Columns = $List[0].PSObject.Properties.Where{ $_.Value -and $_.Value -ne 'Department' }.Name
$List |Select-Object -Property $Columns |Where-Object { $_.$($Columns[0]) } |
ConvertTo-Csv -UseQuote Never |Select-Object -Skip 1 # |Set-Content -Encoding utf8 out.csv
First Name,Last Name,School Year,Enrolment Status
Jane,Doe,2022,Enrolled
Jeff,Dane,2019,Enrolled
Tate,Anderson,2019,Not Enrolled
Daphne,Miller,2021,Enrolled
Cora,Dame,2022,Enrolled
Dora,Explorer,2022,Not Enrolled
Peppa,Diggs,2020,Enrolled
Conrad,Strat,2020,Enrolled
Kat,,,Enrolled
Lance,Bug,,Enrolled

Use a switch statement:
& {
$first = $true
switch -Wildcard -File in.csv { # Loop over all lines in file in.csv
',*' { # intra-department line
# Prepend the department name, eliminate empty fields and output.
$dept + (($_ -split ',' -ne '') -join ',')
}
default {
if ($first) { # header line
# Eliminate empty fields and output.
($_ -split ',' -ne '') -join ','
$first = $false
}
else { # department-only line
$dept = ($_ -split ',')[0] # save department name
}
}
}
} | Set-Content -Encoding utf8 out.csv
Note:
$_ -split ',' splits each line into fields by ,, and -ne '' filters out empty fields from the resulting array; applying -join ',' rejoins the nonempty fields with ,, which in effect removes multiple adjacent , and thereby eliminates empty fields.
If you don't mind the complexity of a regex, you can perform the above more simply with a single -replace operation, as shown in Toni's helpful answer.
Using switch -File is an efficient way to read files line by line and perform conditional processing based on sophisticated matching (as an alternative to -Wildcard you can use -Regex for regex matching, and you can even use script blocks ({ ... } as conditionals).
As a language statement, switch cannot be used directly in a pipeline.
This limitation can be overcome by enclosing it in a script block ({ ... }) invoked with &, which enables the usual, memory-friendly streaming behavior in the pipeline; that is, the lines are processed one by one, as are the modified output lines relayed to Set-Content, so that the input file needn't be read into memory as a whole.
In your case, plain-text processing of your CSV file enabled a simple solution, but in general it is better to parse CSV files into objects whose properties you can work with, using the Import-Csv cmdlet, and, for later re-exporting to a CSV file, Export-Csv,

Related

Add missing comma to a CSV with Powershell

I have a CSV which I process using powershell where occasionally one or more of the rows will be missing one of the comma delimiters. It will always have 3 columns and the 2nd column is optional.
Ex.
Col1,Col2,Col3
SomeCol1Val,,SomeCol3Val
AnotherCol1Val,AnotherCol3Val
In the above example I need to add another comma to Row #2
I've been able to determine which row needs to be updated and how change the value, but I'm not sure how overwrite that specific row in the file.
$csvFile = Get-Content "C:\MyFile.csv"
foreach($row in $csvFile) {
$cnt = ($row.ToCharArray() -eq ',').count
if ($cnt -eq 1) {
$row = $row -replace ",",",,"
}
}
Thanks
As Doug Maurer points out, all that is missing from your code is to write the updated $row values back to your input file, using the Set-Content cmdlet.
However, I suggest a different, faster approach, using a switch statement with the -File option and a single -replace operation based on a regex.
$csvFile = 'C:\MyFile.csv'
$newContent =
switch -File $csvFile {
default { $_ -replace '^([^,]+),([^,]+)$', '$1,,$2' }
}
Set-Content $csvFile -Value $newContent -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
Note that you may have to use the -Encoding parameter to specify the desired character encoding, which in Windows PowerShell is the active ANSI code page and in PowerShell [Core] v6+ BOM-less UTF-8.
If you wanted to stick with your original approach:
$csvFile = 'C:\MyFile.csv'
$newContent =
foreach ($row in Get-Content $csvFile) {
if (($row.ToCharArray() -eq ',').Count -eq 1) {
$row -replace ',', ',,'
} else {
$row
}
}
Set-Content $csvFile -Value $newContent -WhatIf
Note that both approaches collect all (modified) lines in memory as a whole, so as to speed up the operation and also to allow writing back to the input file.
However, it is possible to stream the output, to a different file - i.e. to write the output file line by line - by enclosing the switch statement in & { ... } and piping that to Set-Content. With your Get-Content approach you'd have to use
Get-Content ... | ForEach-Object { ... } | Set-Content instead.

Is there a way to merge similar lines using Powershell?

Suppose I have two csv files. One is
id_number,location_code,category,animal,quantity
12212,3,4,cat,2
29889,7,6,dog,2
98900,
33221,1,8,squirrel,1
the second one is:
98900,2,1,gerbil,1
The second file may have a newline or something at the end (maybe or maybe not, I haven't checked), but only the one line of content. There may be three or four or more different varieties of the "second" file, but each one will have a first element (98900 in this example) that corresponds to an incomplete line in the first file similar to what is in this example.
Is there a way using powershell to automatically merge the line in the second (plus any additional similar) csv file into the matching line(s) of the first file, so that the resulting file is:
12212,3,4,cat,2
29889,7,6,dog,2
98900,2,1,gerbil,1
33221,1,8,squirrel,1
main.csv
id_number,location_code,category,animal,quantity
12212,3,4,cat,2
29889,7,6,dog,2
98900,
33221,1,8,squirrel,1
correction_001.csv
98900,2,1,gerbil,1
merge code used at the commandline, or in the .ps1 file of your choice
$myHeader = #('id_number','location_code','category','animal','quantity')
#Stage all the correction files: last correction in the most recent file wins
$ToFix = #{}
filter Plumbing_Import-Csv($Header){import-csv -LiteralPath $_ -Header $Header}
ls correction*.csv | sort -Property LastWriteTime | Plumbing_Import-Csv $myHeader | %{$ToFix[$_.id_number]=$_}
function myObjPipe($Header){
begin{
function TextTo-CsvField([String]$text){
#text fields which contain comma, double quotes, or new-line are a special case for CSV fields and need to be accounted for
if($text -match '"|,|\n'){return '"'+($text -replace '"','""')+'"'}
return $text
}
function myObjTo-CsvRecord($obj){
return ''+
$obj.id_number +','+
$obj.location_code +','+
$obj.category +','+
(TextTo-CsvField $obj.animal)+','+
$obj.quantity
}
$Header -join ','
}
process{
if($ToFix.Contains($_.id_number)){
$out = $ToFix[$_.id_number]
$ToFix.Remove($_.id_number)
}else{$out = $_}
myObjTo-CsvRecord $out
}
end{
#I assume you'd append any leftover fixes that weren't used
foreach($out in $ToFix.Values){
myObjTo-CsvRecord $out
}
}
}
import-csv main.csv | myObjPipe $myHeader | sc combined.csv -encoding ascii
You could also use ConvertTo-Csv, but my preference is to not have all the extra " cruft.
Edit 1: reduced code redundancy, accounted for \n, fixed appends, and used #OwlsSleeping suggestion about the -Header commandlet parameter
also works with these files:
correction_002.csv
98900,2,1,I Win,1
correction_new.csv
98901,2,1,godzilla,1
correction_too.csv
98902,2,1,gamera,1
98903,2,1,mothra,1
Edit 2: convert gc | ConvertTo-Csv over to Import-Csv to fix the front-end \n issues. Now also works with:
correction_003.csv
29889,7,6,"""bad""
monkey",2
This is a simple solution assuming there's always exactly one match, and you don't care about output order. Change the output path to csv1 to overwrite.
I added headers manually in both input files, but you can specify them in Import-Csv instead if you'd rather avoid changing your files.
[array]$MissingLine = Import-Csv -Path "C:\Users\me\Documents\csv2.csv"
[string]$MissingId = $MissingLine[0].id_number
[array]$BigCsv = Import-Csv -Path "C:\Users\me\Documents\csv1.csv" |
Where-Object {$_.id_number -ne $MissingId}
($BigCsv + $MissingLine) |
Export-Csv -Path "C:\Users\me\Documents\Combined.csv"

Powershell replace text once per line

I have a Powershell script that I am trying to work out part of it, so the text input to this is listing the user group they are part of. This PS script is supposed to replace the group with the groups that I am assigning them in active directory(I am limited to only changing groups in active directory). My issue is that when it reaches HR and replaces it, it will then proceed to contine and replace all the new but it all so replaces the HR in CHRL, so my groups look nuts right now. But I am looking it over and it doesn't do it with every line. But for gilchrist it will put something in there for the HR in the name. Is there anything can I do to keep it for changing or am I going to have to change my HR to Human Resources? Thanks for the help.
$lookupTable = #{
'Admin' = 'W_CHRL_ADMIN_GS,M_CHRL_ADMIN_UD,M_CHRL_SITE_GS'
'Security' = 'W_CHRL_SECURITY_GS,M_CHRL_SITE_GS'
'HR' = 'M_CHRL_HR_UD,W_CHRL_HR_GS,M_CHRL_SITE_GS'
$original_file = 'c:\tmp\test.txt'
$destination_file = 'c:\tmp\test2.txt'
Get-Content -Path $original_file | ForEach-Object {
$line = $_
$lookupTable.GetEnumerator() | ForEach-Object {
if ($line -match $_.Key)
{
$line = $line -replace $_.Key, $_.Value
}
}
$line
} | Set-Content -Path $destination_file
Get-Content $destination_file
test.txt:
user,group
john.smith,Admin
joanha.smith,HR
john.gilchrist,security
aaron.r.smith,admin
abby.doe,secuity
abigail.doe,admin
Your input appears to be in CSV format (though note that your sample rows have trailing spaces, which you'd have to deal with, if they're part of your actual data).
Therefore, use Import-Csv and Export-Csv to read / rewrite your data, which allows a more concise and convenient solution:
Import-Csv test.txt |
Select-Object user, #{ Name='group'; Expression = { $lookupTable[$_.group] } } |
Export-Csv -NoTypeInformation -Encoding Utf8 test2.txt
Import-Csv reads the CSV file as a collection of custom objects whose properties correspond to the CSV column values; that is, each object has a .user and .name property in your case.
$_.group therefore robustly reports the abstract group name only, which you can directly pass to your lookup hashtable; Select-Object is used to pass the original .user value through, and to replace the original .group value with the lookup result, using a calculated property.
Export-Csv re-converts the custom objects to a CSV file:
-NoTypeInformation suppresses the (usually useless) data-type-information line at the top of the output file
-Encoding Utf8 was added to prevent potential data loss, because it is ASCII encoding that is used by default.
Note that Export-Csv blindly double-quotes all field values, whether they need it or not; that said, CSV readers should be able to deal with that (and Import-Csv certainly does).
As for what you tried:
The -replace operator replaces all occurrences of a given regex (regular expression) in the input.
Your regexes amounts to looking for (case-insensitive) substrings, which explains why HR matches both the HR group name and substring hr in username gilchrist.
A simple workaround would be to add assertions to your regex so that the substrings only match where you want them; e.g.: ,HR$ would only match after a , at the end of a line ($).
However, your approach of enumerating the hashtable keys for each input CSV row is inefficient, and you're better off splitting off the group name and doing a straight lookup based on it:
# Split the row into fields.
$fields = $line -split ','
# Update the group value (last field)
$fields[-1] = $lookupTable[$fields[-1]]
# Rebuild the line
$line = $fields -join ','
Note that you'd have to make an exception for the header row (e.g., test if the lookup result is empty and refrain from updating, if so).
Why don't you load your text file as a CSV file, using Import-CSV and use "," as a delimiter?
This will allow you to have a Powershell Object you can work on. and then export it as text o CSV. if I use your file & lookup table this code may help you :
$file = Import-Csv -Delimiter "," -Path "c:\ps\test.txt"
$lookupTable = #{
'Admin' = 'W_CHRL_ADMIN_GS,M_CHRL_ADMIN_UD,M_CHRL_SITE_GS'
'Security' = 'W_CHRL_SECURITY_GS,M_CHRL_SITE_GS'
'HR' = 'M_CHRL_HR_UD,W_CHRL_HR_GS,M_CHRL_SITE_GS'}
foreach ($i in $file) {
#Compare and replace
...
}
Export-CSV $file -Delimiter ","
You can then iterate over $file and compare and replace. you can also Export-CSV after you're done.

Grabbing specific sections of a txt file via Powershell

I am new to Powershell scripting, but I feel I am overlooking a simple answer, hopefully some of you can help.
My company exports files from all of our computers with a section around the middle of Mapped Network Printers. It looks like this:
-------------------------------------------------------------------------
Mapped Network Printers:
NetworkAddress\HP425DN [DEFAULT PRINTER]
-------------------------------------------------------------------------
Local Printers:
What I have been asked to do is copy just the Mapped Network Printers to a new text file.
I tried using Select-String with a context parameter, but I have no way of knowing how many network printers there are, so I can't guess.
I also tried using the following code which I found on this site, but it returns nothing:
$MapPrint = gc C:\Users\User1\Documents\Config.txt
$from = ($MapPrint | Select-String -pattern "Mapped Network Printers:" |
Select-Object LineNumber).LineNumber
$to = ($MapPrint | Select-String -pattern "-------------------------------
--------------------------------------------" | Select-Object
LineNumber).LineNumber
$i = 0
$array = #()
foreach ($line in $MapPrint)
{
foreach-object { $i++ }
if (($i -gt $from) -and ($i -lt $to))
{
$array += $line
}
}
$array
I basically want to start the search at "Mapped Network Printers" and end it at the next row of "------"
Any help would be greatly appreciated.
Select-String has no feature for extracting a range of lines based on content.
The simplest approach is to read the file as a whole and use the -replace operator to extract the range via a regular expression (regex):
$file = 'C:\Users\User1\Documents\Config.txt'
$regex = '(?sm).*^Mapped Network Printers:\r?\n(.*?)\r?\n---------------------.*'
(Get-Content -Raw $file) -replace $regex, '$1'
Reading an input file as a whole can be problematic with files too large to fit into memory, but that's probably not a concern for you.
On the plus side, this approach is much faster than processing the lines in a loop.
Get-Content -Raw (PSv3+) reads the input file as a whole.
Inline regex options (?sm) turn on both the multi-line and the single-line option:
m means that ^ and $ match the start and end of each line rather than the input string as a whole.
s means that metacharacter . matches \n characters too, so that an expression such as .* can be used to match across lines.
\r?\n matches a single line break, both the CRLF and the LF variety.
(.*?) is the capture group that (non-greedily) captures everything between the bounding lines.
Note that the regex matches the entire input string, and then replaces it with just the substring (range) of interest, captured in the 1st (and only) capture group ($1).
Assuming that $file contains:
-------------------------------------------------------------------------
Mapped Network Printers:
NetworkAddress\HP425DN [DEFAULT PRINTER]
NetworkAddress\HP426DN
-------------------------------------------------------------------------
Local Printers:
the above yields:
NetworkAddress\HP425DN [DEFAULT PRINTER]
NetworkAddress\HP426DN
You could use Select-String or Where-Object to look for words with a \. Taking that even further you could look for just the server\printer values with a RegEx like this:
Get-Content C:\Users\User1\Documents\Config.txt -Raw |
Select-String '[A-Z0-9]+\\[A-Z0-9]+' -AllMatches |
ForEach-Object {$_.Matches.Value}
Note that this makes the assumption the Server Names and Printers use only A-Z and 0-9, you may need to look for more characters if that is not a valid assumption.
Here would be an example of using Where-Object to filter for lines with \
Get-Content 'C:\Users\User1\Documents\Config.txt' | Where-Object {$_ -like '*\*'}
$Doc= "C:\temp\test.txt"
$Doc_end ="C:\temp\testfiltered.txt"
$reader = [System.IO.File]::OpenText($Doc)
$cdata=""
while($null -ne ($line = $reader.ReadLine()))
{
if ($line -like ('---*') ) {$Read = 0 }
if ($Read -eq 1) {$cdata+= $line + "`r`n"}
if ($line -like ('Mapped Network Printers:*')) {$Read = 1}
}
$cdata | Out-File $Doc_end -Force
You can do what you are attempting with the foreach-object command and a few additional test conditions. Simply setting a flag when you encounter the Mapped Network Printers: line and then terminating output on the next line -like "---*" will work, e.g.
## positional parameters
param(
[Parameter(Mandatory=$true)][string]$infile
)
$beginprn = 0
get-content $infile | foreach-object {
# terminate condition
if ([int]$beginprn -eq 1 -and $_ -like "---*") {
break
}
# output Mapped printers
if ([int]$beginprn -eq 1) {
write-host $_
}
# begin condition
if ($_ -eq "Mapped Network Printers:") {
$beginprn = 1
}
}
Example Input File
-------------------------------------------------------------------------
Mapped Network Printers:
NetworkAddress\HP425DN [DEFAULT PRINTER]
NetworkAddress\HP4100N
-------------------------------------------------------------------------
Local Printers:
Example Use/Output
PS> parseprn.ps1 .\tmp\prnfile.txt
NetworkAddress\HP425DN [DEFAULT PRINTER]
NetworkAddress\HP4100N

Powershell Import-csv with return character

I tried the following to turn a text file into a document by leveraging import-csv where each item in the original document was a new line
Sample file.txt
James Cameron
Kirk Cobain
Linda Johnson
Code:
$array = import-csv file.txt | ConvertFrom-Csv -Delim `r
foreach ($Data in $array)
{
if (sls $Data Master.txt -quiet)
{Add-Content file.txt $Data}
}
It never created the document
Import-Csv takes a CSV and outputs PSCustomObjects. It's intended for when the file has a header row, and it reads that as the properties of the objects. e.g.
FirstName,LastName
James,Cameron
Kirk,Cobain
# ->
#{FirstName='James';LastName='Cameron'}
#{FirstName='Kirk';LastName='Cobain'}
etc.
If your file has no header row, it will take the first row and then ruin everything else afterwards. You need to provide the -Header 'h1','h2',... parameter to fix that. So you could use -Header Name, but your data only has one property, so there's not much benefit.
ConvertFrom-Csv is intended to do the same thing, but from CSV data in a variable instead of a file. They don't chain together usefully. It will try, but what you end up with is...
A single object, with a property called '#{James=Kirk}' and a value of '#{James=Linda}', where 'James' was taken from line 1 as a column header, and the weird syntax is from forcing those objects through a second conversion.
It's not at all clear why you are reading in from file.txt and adding to file.txt. But since you don't have a CSV, there's no benefit from using the CSV cmdlets.
$lines = Get-Content file.txt
$master = Get-Content master.txt
foreach ($line in $lines)
{
if ($master -contains $line)
{
Add-Content file2.txt $line
}
}
or just
gc file.txt |? { sls $_ master.txt -quiet } | set-content file2.txt
Auto-generated PS help links from my codeblock (if available):
gc is an alias for Get-Content (in module Microsoft.PowerShell.Management)
? is an alias for Where-Object
sls is an alias for Select-String (in module Microsoft.PowerShell.Utility)
Set-Content (in module Microsoft.PowerShell.Management)