Reformat column names in a csv with PowerShell

Reformat column names in a csv with PowerShell - powershell

Question
How do I reformat an unknown CSV column name according to a formula or subroutine (e.g. rename column " Arbitrary Column Name " to "Arbitrary Column Name" by running a trim or regex or something) while maintaining data?
Goal
I'm trying to more or less sanitize columns (the names) in a hand-produced (or at least hand-edited) csv file that needs to be processed by an existing PowerShell script. In this specific case, the columns have spaces that would be removed by a call to [String]::Trim(), or which could be ignored with an appropriate regex, but I can't figure a way to call or use those techniques when importing or processing a CSV.
Short Background
Most files and columns have historically been entered into the CSV properly, but recently a few columns were being dropped during processing; I determined it was because the files contained a space (e.g., Select-Object was being told to get "RFC", but Import-CSV retrieved "RFC ", so no matchy-matchy). Telling the customer to enter it correctly by hand (though preferred and much simpler) is not an option in this case.
Options considered
I could manually process the text of the file, but that is a messy and error prone way to re-invent the wheel. I wonder if there's a syntax with Select-Object that would allow a softer match for column names, but I can't find that info.
The closest I have come conceptually is using a calculated property in the call to Select-Object to rename the column, but I can only find ways to rename a known column to another known column. So, this would require enumerating the columns and matching them exactly (preferred) or a softer match (like comparing after trimming or matching via regex as a fallback) with expected column names, then creating a collection of name mappings to use in constructing calculated properties from that information to select into a new object.
That seems like it would work, but more it's work than I'd prefer, and I can't help but hope that there's a simpler way I haven't been able to find via Google. Maybe I should try Bing?

Sample File
Let's say you have a file.csv like this:
" RFC "
"1"
"2"
"3"
Code
Now try to run the following:
$CSV = Get-Content file.csv -First 2 | ConvertFrom-Csv
$FixedHeaders = $CSV.PSObject.Properties.Name.Trim(' ')
Import-Csv file.csv -Header $FixedHeaders |
Select-Object -Skip 1 -Property RFC
Output
You will get this output:
RFC
---
1
2
3
Explanation
First we use Get-Content with parameter -First 2 to get the first two lines. Piping to ConvertFrom-Csv will allow us to access the headers with PSObject.Properties.Name. Use Import-Csv with the -Header parameter to use the trimmed headers. Pipe to Select-Object and use -Skip 1 to skip the original headers.

I'm not sure about comparisons in terms of efficiency, but I think this is a little more hardened, and imports the CSV only once. You might be able to use #lahell's approach and Get-Content -raw, but this was done and it works, so I'm gonna leave it to the community to determine which is better...
#import the CSV
$rawCSV = Import-Csv $Path
#get actual header names and map to their reformatted versions
$CSVColumns = #{}
$rawCSV |
Get-Member |
Where-Object {$_.MemberType -eq "NoteProperty"} |
Select-Object -ExpandProperty Name |
Foreach-Object {
#add a mapping to the original from a trimmed and whitespace-reduced version of the original
$CSVColumns.Add(($_.Trim() -replace '(\s)\s+', '$1'), "$_")
}
#Create the array of names and calculated properties to pass to Select-Object
$SelectColumns = #()
$CSVColumns.GetEnumerator() |
Foreach-Object {
$SelectColumns += {
if ($CSVColumns.values -contains $_.key) {$_.key}
else { #{Name = $_.key; Expression = $CSVColumns[$_.key]} }
}
}
$FormattedCSV = $rawCSV |
Select-Object $SelectColumns
This was hand-copied to a computer where I don't have the rights to run it, so there might be an error - I tried to copy it correctly

You can use gocsv https://github.com/DataFoxCo/gocsv to see the headers of the csv, you can then rename the headers, behead the file, swap columns, join, merge, any number of transformations you want

Related

Powershell: how to retrieve powershell commands from a csv and execute one by one, then output the result to the new csv

I have a Commands.csv file like:
| Command |
| -----------------------------------------------|
|(Get-FileHash C:\Users\UserA\Desktop\File1).Hash|
|(Get-FileHash C:\Users\UserA\Desktop\File2).Hash|
|(Get-FileHash C:\Users\UserA\Desktop\File3).Hash|
Header name is "Command"
My idea is to:
Use ForEach ($line in Get-Content C:\Users\UserA\Desktop\Commands.csv ) {echo $line}
Execute $line one by one via powershell.exe, then output a result to a new .csv file - "result.csv"
Can you give me some directions and suggestions to implement this idea? Thanks!

Important:
Only use the technique below with input files you either fully control or implicitly trust to not contain malicious commands.
To execute arbitrary PowerShell statements stored in strings, you can use Invoke-Expression, but note that it should typically be avoided, as there are usually better alternatives - see this answer.
There are advanced techniques that let you analyze the statements before executing them and/or let you use a separate runspace with a restrictive language mode that limits what kinds of statements are allowed to execute, but that is beyond the scope of this answer.
Given that your input file is a .csv file with a Commands column, import it with Import-Csv and access the .Commands property on the resulting objects.
Use Get-Content only if your input file is a plain-text file without a header row, in which case the extension should really be .txt. (If it has a header row but there's only one column, you could get away with Get-Content Commands.csv | Select-Object -Skip 1 | ...). If that is the case, use $_ instead of $_.Commands below.
To also use the CSV format for the output file, all commands must produce objects of the same type or at least with the same set of properties. The sample commands in your question output strings (the value of the .Hash property), which cannot meaningfully be passed to Export-Csv directly, so a [pscustomobject] wrapper with a Result property is used, which will result in a CSV file with a single column named Result.
Import-Csv Commands.csv |
ForEach-Object {
[pscustomobject] #{
# !! SEE CAVEAT AT THE TOP.
Result = Invoke-Expression $_.Commands
}
} |
Export-Csv -NoTypeInformation Results.csv

Alternative way to remove duplicates from CSV other than Sort-Object -unique?

I have a bug I cannot beat. When I run my script gets to this chunk of code it is incorrectly removing unique values:
import-csv "$LocalPath\A1-$abbrMonth$Year.csv" |
where {$_."CustomerName" -match $Customersregex} |
select "SubmitterID","SubmitterName","JobDate","JobTime",#{Name="Form";Expression={if ($_.FormName -match "Copy"){"C"};if ($_.FormName -match "Letter"){"L"} else {""} }},"TotalDocs",#{Name="AddnPages";Expression={$_.TotalAdditionalPages}},"InputFilename",#{Name="ActualDocs";Expression={[string]([int]$_.RegularDocs + [int]$_.UnqualifiedDocs)}}|
sort "InputFilename" -Unique |
export-csv "$LocalPath\A2-$abbrMonth$Year.csv" -NoTypeInformation
It's occurring during the "sort "InputFilename" -Unique" line, however it will work properly when I cut it up and execute it line by line, but not in the original script.
Is there any other way to remove duplicates based on the value of a column? I've tried using the "-unique" parameter on the Select-Object statement but I can't find a way to limit it to only one column.
EDIT: To clarify the issue I'm having, I have a LARGE list of accounting data. I'm trying to remove duplicate entries by using "Sort -unique". After the above code is running, there are entries missing that should not be because they are unique. I can isolate them in their own CSV, run the above code and all entries are present that should be, however when I run my master CSV file through the above code (and only that code, nothing else) and search for those entries they are missing.
EDIT 2: Looks like it was an issue with the data file. Good grief.

You can always group things, then expand the first item in the group. It's not fast, but it works for what you're doing.
import-csv "$LocalPath\A1-$abbrMonth$Year.csv" |
where {$_."CustomerName" -match $Customersregex} |
group InputFilename |
% { $_.Group[0] } |
select "SubmitterID","SubmitterName","JobDate","JobTime",#{Name="Form";Expression={if ($_.FormName -match "Copy"){"C"};if ($_.FormName -match "Letter"){"L"} else {""} }},"TotalDocs",#{Name="AddnPages";Expression={$_.TotalAdditionalPages}},"InputFilename",#{Name="ActualDocs";Expression={[string]([int]$_.RegularDocs + [int]$_.UnqualifiedDocs)}}|
sort "InputFilename" |
export-csv "$LocalPath\A2-$abbrMonth$Year.csv" -NoTypeInformation

How can I alternate column headers in a tab delimited file?

I have a tab delimited txt file and i need to switch first and second column names (without switching columns data). In other words I need to rename A(Id) to B(ExternalId) and B(ExternalId) to A(Id). Other columns in the file (other data) should stay unchanged. I'm very new in PowerShell, please advice. As I understand I need to use import/export csv cmdlet.
I tryed this, but it's not working the right way...
Import-Csv 'C:\original_users.txt' |
Select-Object Id, #{Name="ExternalId";Expression={$_."Id"}}; Select-Object ExternalId, #{Name="Id";Expression={$_."ExternalId"}} |
Export-Csv 'C:\changed_users.txt'

The Import-CSV and Export-CSV cmdlets have their strengths but this might not be one of them. The latter cmdlet would introduce quoting that might not be in your original file and that might not be desired.
Either way why not just do some text manipulation on the first line! Lets read in the file and and output the first lined, edited, and the remainder of the file. This sample uses a new location but you could easily write it back to the same file.
# Get the full file into a variable
$fullFile = Get-Content "c:\temp\mockdata.csv"
# Parse the first line into a column array
$columns = $fullFile[0].Split("`t")
# Rebuild the header by switching the columns order as desired.
$newHeader = ($columns[1],$columns[0] + ($columns | Select-Object -Skip 2)) -join "`t"
# Write the header back to file then the rest of the data.
$outputPath = "C:\somepath.txt"
$newHeader | Set-Content $outputPath
$fullFile | Select-Object -Skip 1 | Add-Content $outputPath
This also preserves the presence of other columns and their data.

Learning PowerShell. Create usernames no longer than 8 characters and check for collision

I'm learning powershell right now.
I need to import a CSV like this:
lastname,firstname
lastname,firstname
lastname,firstname
etc
Then create a list of usernames no longer then 8 characters and check for collisions.
I have found bits and pieces of scripting around but not sure how to tie it all together.
I use Import-Csv to import my file.csv:
$variablename = import-csv C:\path\to\file.csv
but then I am not sure if I just import it into an array or not. I am not familiar with how for loops work in powershell exactly.
Any direction? Thanks.

There are a couple of concepts that are central to understanding PowerShell. Firstly, remember that you are always working with objects. So after importing your CSV file, your $variablename will refer to a collection of sub-objects.
Secondly, you can use the PowerShell pipeline to send the output of one cmdlet to the input of another. Some cmdlets will understand if you send them a collection, and automatically process each row.
If think what you're looking for though is the foreach-object cmdlet, which will allow you to run code against each item in the collection. Code inside the foreach-object block can refer to the $_ automatic variable which will contain the current object.
Assuming your CSV file is well formatted and has a header row with the column names, you can refer to each column by name e.g. $_.lastname & $_.firstname.
To put it all together:
import-csv C:\path\to\file.csv |
foreach-object {
write-host "Processing: $($_.lastname), $($_.firstname)"
# logic here to calculate username and create AD account
}
PowerShell can have a bit of a learning curve if you are coming from a different scripting environment. Here are a couple of resources that I've found helpful:
PowerShell 'gotchas' http://www.rlmueller.net/PSGotchas.htm
Keith Hill's Effective PowerShell: https://rkeithhill.wordpress.com/2009/03/08/effective-windows-powershell-the-free-ebook/
Also, check out the Technet Script Center, where there are many hundreds of Active Directory scripts. https://technet.microsoft.com/en-us/scriptcenter/bb410849.aspx

The script below should help you grasp a few concepts on how to work with csvs and manipulate data using PowerShell.
# the code below uses a 'here string' to mimic the import of a csv.
$users = #'
smith,b
smith,bob
smith,bobby
smith,sonny
smithson,john
smithson,jane
smithers,rob
'# -split "`r*`n"
$users |
ConvertFrom-Csv -Header 'surname','firstname' |
Select-Object #{Name='username'; Expression={"$($_.surname)$($_.firstname) "}}, surname, firstname |
Group-Object { $_.username.Substring(0,8).Trim() } |
Select-Object #{Name='username'; Expression={$_.Name}}, Count |
Format-Table -AutoSize
The $users | line takes the list of $users and pipes into the next command.
The ConvertFrom-Csv -Header... line converts the string into a csv.
The Select-Object #{Name... line creates an expression alias, which concatenates surname+forename. You'll notice the extra 8 spaces we append to the end of the string so we know we will have at least 8 characters in the string.
The Group-Object {... line groups the username, using the first 8 characters, if available. The .Trim() gets rid of any trailing spaces.
The Select-Object #{Name='username'... line takes the Name field from the group-object and renames to username and also shows the count from the grouping operation.
The Format-Table -AutoSize line is purely for output formatting to the console and gives you an output like the one below.
username Count
-------- -----
smithb 1
smithbob 2
smithson 3
smithers 1
An amended version of the above code, which you can use on your real csv. Change the surname, firstname column names to suit your csv.
# you would use the code below, to import your list of names
# uncomment the `# -Header surname,firstname` bit if your csv has no headers
$users = Import-Csv -Path 'c:\path\to\names.csv' # -Header surname,firstname
$users |
Select-Object #{Name='username'; Expression={"$($_.surname)$($_.firstname) "}}, surname, firstname |
Group-Object { $_.username.Substring(0,8).Trim() } |
Select-Object #{Name='username'; Expression={$_.Name}}, Count

Import-Csv include empty fields in end of row

Edit
I'll conclude that Import-Csv is not ideal for incorrect formatted CSV and will use Get-Content and split. Thanks for all the answers.
Example CSV:
"SessionID","ObjectName","DatabaseName",,,,,,,,
"144","","AC"
Using Import-Csv none of the empty fields at the end will be counted - it will simply stop after "DatabaseName".
Is there any way to include the empty fields?
Edit:
I simply need to count the fields and make sure there are less than X amount of them. It is not only the header that might contain empty fields but also the content. These files are often manually made and not properly formatted. Since the files also can get very large, I would prefer to not also use Get-Content and split since I'm already using Import-Csv and its properties.

Looks like it's missing its headers. If you would add some, it would work fine.
You could do something like
Get-Content My.CSV | Select -skip 1 | ConvertFrom-Csv -Header "SessionID","ObjectName","DatabaseName",'Whatnot1', 'Whatnot2', 'Whatnot3'

As dbso suggested split and Length will help you. I was on the way to code a header routine which now is obsolete. Nevertheless here it is:
$FileIn = "Q:\test\2017-01\06\SO_41505840.csv"
$Header= (Get-Content $FileIn|select -first 1)-split(",")
"Fieldcount for $FileIn is $($Header.Length)"
for($i=0; $i -lt $Header.Length; $i++){if ($Header[$i] -eq ""){$Header[$i]="`"Column$($i+1)`""}}
$Header -Join(",")
Returning this output
Fieldcount for Q:\test\2017-01\06\SO_41505840.csv is 11
"SessionID","ObjectName","DatabaseName","Column4","Column5","Column6","Column7","Column8","Column9","Column10","Column11"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse