Powershell - Querying CSV column names and writing them to a new CSV - powershell

So I'm trying to look at column names in a CSV, write them to an array and then spit the data back out into a new CSV with a new column attached. I don't really care what the current data table looks like so long as I can add a column to the headers. This seems like a fairly basic thing to be able to do but I can't seem to find any situations where anyone has done this. For example:
"Name","Location","Phone"
"John Smith","Toronto","555.555.5555"
"Jane Doe","Dallas","555.555.5554"
I just want to keep the Name, Location and Phone column names and put them into another CSV with one extra column. The catch is, in reality, there are more columns and they aren't always the same so the script needs to be able to be column name-agnostic. I should be able to feed it any CSV with any number of columns with different names and be able to get an output file with just the column names. I've tried at least a dozen different ways to do this and keep bumping into different issues.
Example:
$validatedfilepath = "validfile.csv"
$csvData = New-Object PSObject
$csvData = Import-CSV "file.csv" | Add-Member #{ID="$null"} -PassThru
$csvdata = $csvdata | get-member -Name * -MemberType NoteProperty
$csvData = #($csvData.Name)
$csvData
That will show me the exact list of column names that I want in my new validfile but I have no idea how to export it into a CSV as the column names. Each time I've tried doing export-csv, it either gives me the character length count in a new row for each or some other goofy stuff.
Thanks

The Export-CSV reads the column names from the members of the input object. So you are on the right track already, you just need to simplify a bit, like this:
$csvData = Import-CSV "file.csv" | Add-Member #{ID="$null"} -PassThru
# You can now modify/enrich/filter the data somehow, for example like $csvdata[0].ID = 56387, then just export it:
$csvData | Export-CSV -NoTypeInformation -Path "newfile.csv"
The error you made is overwriting the imported data in $csvdata with the Get-Member result. And then you overwrite it again with the #($csvData.Name) expression.

Related

Powershell remove a column from csv only if a word is present

I have a csv with columns A, B, C. I would like to import that to powershell and only on column C, remove any rows that have the word "Unknown" listed. If Column A or B has "Unknown", they stay, however, if Column C has it, the entire row gets deleted. Per the picture below, Row 4 would be deleted.
Can someone please provide a sample script to do this?
Thanks!
So, you have 3 problems you need to solve:
Import the data from the CSV file
Filter it based on the value of column C
Export the filtered data to a file again
To import, use the aptly named Import-Csv cmdlet:
$data = Import-Csv .\path\to\file.csv
Import-Csv will parse the CSV file and for each row it reads, it will output 1 object with properties corresponding to the column names in the header row.
To filter these objects based on the value of their C property, use the Where-Object cmdlet:
$filteredData = $data |Where-Object C -ne 'Unknown'
Where-Object will test whether the C property on each object does not have the value 'Unknown' (-ne = not equals), and discard any object for which that's not the case.
To re-export the filtered data, use the Export-Csv cmdlet:
$filteredData |Export-Csv .\path\to\output.csv -NoTypeInformation
You can also combine all three statements into a single pipeline expression:
Import-Csv .\path\to\file.csv |Where-Object C -ne 'Unknown' |Export-Csv .\path\to\output.csv -NoTypeInformation
This "one-liner" approach might be preferable if you're working on large CSV files (> hundreds of thousands of records), as it doesn't require reading the entire CSV file into memory at once.
$Data = Get-Content "C:\file.csv" | ConvertFrom-Csv
$Data | Where-Object {$_.C-ne 'Unknown'} | Export-Csv "C:\file_New.csv"

How can I alternate column headers in a tab delimited file?

I have a tab delimited txt file and i need to switch first and second column names (without switching columns data). In other words I need to rename A(Id) to B(ExternalId) and B(ExternalId) to A(Id). Other columns in the file (other data) should stay unchanged. I'm very new in PowerShell, please advice. As I understand I need to use import/export csv cmdlet.
I tryed this, but it's not working the right way...
Import-Csv 'C:\original_users.txt' |
Select-Object Id, #{Name="ExternalId";Expression={$_."Id"}}; Select-Object ExternalId, #{Name="Id";Expression={$_."ExternalId"}} |
Export-Csv 'C:\changed_users.txt'
The Import-CSV and Export-CSV cmdlets have their strengths but this might not be one of them. The latter cmdlet would introduce quoting that might not be in your original file and that might not be desired.
Either way why not just do some text manipulation on the first line! Lets read in the file and and output the first lined, edited, and the remainder of the file. This sample uses a new location but you could easily write it back to the same file.
# Get the full file into a variable
$fullFile = Get-Content "c:\temp\mockdata.csv"
# Parse the first line into a column array
$columns = $fullFile[0].Split("`t")
# Rebuild the header by switching the columns order as desired.
$newHeader = ($columns[1],$columns[0] + ($columns | Select-Object -Skip 2)) -join "`t"
# Write the header back to file then the rest of the data.
$outputPath = "C:\somepath.txt"
$newHeader | Set-Content $outputPath
$fullFile | Select-Object -Skip 1 | Add-Content $outputPath
This also preserves the presence of other columns and their data.

How to import CSV column? Import-Csv not working as planned

I have a drop down list that's being populated by a CSV file. There are many columns in the CSV, but it only pulls from the Name column. Here's what I have that works just fine in most Win 7, and all Win 8+ PC's I've tried it on.
$customers = Import-CSV "$dir\Apps\customers.csv"
$List = $customers.name | Sort-Object
After that there's a ForEach loop to put each item from the list into the menu.
Lately I've been noticing an issue on a couple Win 7 PC's that I can't figure out. The import option doesn't work unless I specify all the headers with the -Header option. I get this error:
After getting it to import correctly by adding all the headers I can't get it to save $customers.name into the $List variable, with or without the sorting. However, if I provide an index number (ex $customers[2].name) it works.
To work around this I've looked at ways to measure the number of rows in the CSV by using the following options after $customers:
$csvlength = Get-Content "$dir\Apps\customers.csv" | Measure-Object - lines
or
$csvlength = Import-CSV "$dir\Apps\customers.csv" | Measure-Object
From there I can see the length by looking at $csvlength.lines or $csvlength.count.
Is there a way to use that information to save the list of names into $List? I've tried something like this with no success:
$List = $customers[0-$csvlength.count] | Sort-Object
Also, I've noticed that when importing the headers it includes Name in the list. If at all possible I'd like to not include the header. I also have a line at the end of the CSV that has other info in it, but no name. That shows up as a blank line. If possible I'd like that to be removed as well.
PowerShell v2 $array.Foo only allows access to a property Foo of the array object itself, not to a property Foo of the array elements. Allowing access to element properties via the array variable is one of the major changes that was introduced with PowerShell v3.
To work around this limitiation in v2 you need to expand the property before sorting:
$List = $customers | Select-Object -Expand name | Sort-Object

Using duplicate headers in Powershell .csv file

I have a .csv file and I want to import it into powershell then iterate through the file changing certain values. I then want the output to append to the original .csv file, so that the values have been updated.
My issue is that the .csv file has headers which aren't unique, and can't be changed as then it won't work in another program. Originally I defined my own headers in the powershell to get around this but then the output file has these new headers when it needs to have the old ones.
I have also tried ConvertFrom-Csv which means I can no longer access the columns I need to, so lots of runtime errors.
What would be ideal is to be able to use the defined column headers and then convert back to the original column headers. My current code is below:
$csvfile = Import-Csv C:\test.csv| Where-Object {$_.'3' -eq $classID} | ConvertFrom-Csv
foreach($record in $csvfile){
*do something*}
$csvfile | Export-Csv -path C:\test.csv -NoTypeInformation -Append
I've searched the web now for some hours and tried everything I've come across, to no avail.
Thanks in advance.
This is a somewhat hackish implementation but should work.
Remove all the headers as a single line and save it somewhere
Parse the new result-set (with the headers removed)
Add the line at the top when you are finished
A CSV is a comma delimited file, you don't have to treat it like structured data. Feel free to splice and dice as you want.
Since you know beforehand how many columns are in the input CSV file, you can import without the header and process internally. Example:
$columns = 78
Import-Csv "inputfile.csv" -Header (0..$($columns - 1)) | Select-Object -Skip 1 | ForEach-Object {
$row = $_
$outputObject = New-Object PSObject
0..$($columns- 1) | ForEach-Object {
$outputObject | Add-Member NoteProperty "Col$_" $row.$_
}
$outputObject
} | Export-Csv "outputfile.csv" -NoTypeInformation
This example generates new PSObjects and then outputs a new CSV file with generic column names (Col0, Col1, etc.).

Reformat column names in a csv with PowerShell

Question
How do I reformat an unknown CSV column name according to a formula or subroutine (e.g. rename column " Arbitrary Column Name " to "Arbitrary Column Name" by running a trim or regex or something) while maintaining data?
Goal
I'm trying to more or less sanitize columns (the names) in a hand-produced (or at least hand-edited) csv file that needs to be processed by an existing PowerShell script. In this specific case, the columns have spaces that would be removed by a call to [String]::Trim(), or which could be ignored with an appropriate regex, but I can't figure a way to call or use those techniques when importing or processing a CSV.
Short Background
Most files and columns have historically been entered into the CSV properly, but recently a few columns were being dropped during processing; I determined it was because the files contained a space (e.g., Select-Object was being told to get "RFC", but Import-CSV retrieved "RFC ", so no matchy-matchy). Telling the customer to enter it correctly by hand (though preferred and much simpler) is not an option in this case.
Options considered
I could manually process the text of the file, but that is a messy and error prone way to re-invent the wheel. I wonder if there's a syntax with Select-Object that would allow a softer match for column names, but I can't find that info.
The closest I have come conceptually is using a calculated property in the call to Select-Object to rename the column, but I can only find ways to rename a known column to another known column. So, this would require enumerating the columns and matching them exactly (preferred) or a softer match (like comparing after trimming or matching via regex as a fallback) with expected column names, then creating a collection of name mappings to use in constructing calculated properties from that information to select into a new object.
That seems like it would work, but more it's work than I'd prefer, and I can't help but hope that there's a simpler way I haven't been able to find via Google. Maybe I should try Bing?
Sample File
Let's say you have a file.csv like this:
" RFC "
"1"
"2"
"3"
Code
Now try to run the following:
$CSV = Get-Content file.csv -First 2 | ConvertFrom-Csv
$FixedHeaders = $CSV.PSObject.Properties.Name.Trim(' ')
Import-Csv file.csv -Header $FixedHeaders |
Select-Object -Skip 1 -Property RFC
Output
You will get this output:
RFC
---
1
2
3
Explanation
First we use Get-Content with parameter -First 2 to get the first two lines. Piping to ConvertFrom-Csv will allow us to access the headers with PSObject.Properties.Name. Use Import-Csv with the -Header parameter to use the trimmed headers. Pipe to Select-Object and use -Skip 1 to skip the original headers.
I'm not sure about comparisons in terms of efficiency, but I think this is a little more hardened, and imports the CSV only once. You might be able to use #lahell's approach and Get-Content -raw, but this was done and it works, so I'm gonna leave it to the community to determine which is better...
#import the CSV
$rawCSV = Import-Csv $Path
#get actual header names and map to their reformatted versions
$CSVColumns = #{}
$rawCSV |
Get-Member |
Where-Object {$_.MemberType -eq "NoteProperty"} |
Select-Object -ExpandProperty Name |
Foreach-Object {
#add a mapping to the original from a trimmed and whitespace-reduced version of the original
$CSVColumns.Add(($_.Trim() -replace '(\s)\s+', '$1'), "$_")
}
#Create the array of names and calculated properties to pass to Select-Object
$SelectColumns = #()
$CSVColumns.GetEnumerator() |
Foreach-Object {
$SelectColumns += {
if ($CSVColumns.values -contains $_.key) {$_.key}
else { #{Name = $_.key; Expression = $CSVColumns[$_.key]} }
}
}
$FormattedCSV = $rawCSV |
Select-Object $SelectColumns
This was hand-copied to a computer where I don't have the rights to run it, so there might be an error - I tried to copy it correctly
You can use gocsv https://github.com/DataFoxCo/gocsv to see the headers of the csv, you can then rename the headers, behead the file, swap columns, join, merge, any number of transformations you want