Select CSV columns in Powershell where header name contains a specific string - powershell
I have a data file of about 10-15 columns from which I want to extract specific columns. Some of the columns I know the exact column header and others I only know that the first two letters will always be "FC".
How do I select only the columns where I know the column header and those that start with "FC"?
Starting with just the "FC" columns, I have tried like this:
$myCSV = Import-CSV "mydata.txt" -Delimiter "`t"
$FCcols = $myCSV[0].psobject.Properties | foreach { $_.Name } | Where {$_ -match "FC"}
$myCSV | select $FCcols
But I just get an error:
Select-Object : Cannot convert System.Management.Automation.PSObject to one of
the following types {System.String, System.Management.Automation.ScriptBlock}.
At line:3 char:16
+ $myCSV | select <<<< $FCcols
+ CategoryInfo : InvalidArgument: (:) [Select-Object], NotSupport
edException
+ FullyQualifiedErrorId : DictionaryKeyUnknownType,Microsoft.PowerShell.Co
mmands.SelectObjectCommand
Then, if I try:
$myCSV = Import-CSV "mydata.txt" -Delimiter "`t"
$FCcols = [System.Collections.ArrayList]#()
$myCSV[0].psobject.Properties | foreach { $_.Name } | Where {$_ -match "FC"} | %{$FCcols.Add($_)}
$myCSV | select $FCcols
I get the output I want except that it is in "column header : value" format, like this:
FC1839 : 0
FC1842 : 1
FC1843 : 6
FC1844 : 12
FC1845 : 4
FC1839 : 0
FC1842 : 0
FC1843 : 19
FC1844 : 22
FC1845 : 14
I am probably just missing something simple, but how do I get to the point that I am able to select these matching columns and then output them to another .txt file (without the header : value format)?
First things first: Mathias R. Jessen's helpful tip not only solves your problem, but significantly simplifies the approach (and also works in PSv2):
$myCSV | Select-Object FC*
The (implied) -Property parameter supports wildcard expressions, so FC* matches all property (column names) that start with FC.
As for the output format you're seeing: Because you're selecting 5 properties, PowerShell defaults to implicit Format-List formatting, with each property name-value pair on its own line.
To fix this display problem, pipe to Format-Table explicitly (which is what PowerShell would do implicitly if you had selected 4 or fewer properties):
$myCSV | Select-Object FC* | Format-Table
To re-export the results to a CSV (TSV) file:
Import-Csv mydata.txt -Delimiter "`t" | Select-Object FC* |
Export-Csv myresults.txt -Encoding Utf8 -Delimiter "`t" -NoTypeInformation
To do so without a header line:
Import-Csv mydata.txt -Delimiter "`t" | Select-Object FC* |
ConvertTo-Csv -Delimiter "`t" -NoTypeInformation | Select-Object -Skip 1 |
Set-Content myresults.txt -Encoding Utf8
As for your specific symptom:
The problem occurs only in PSv2, and it smells like a bug to me.
The workaround is make your column-name array a strongly typed string array ([string[]]):
[string[]] $FCcols = $myCSV[0].psobject.Properties | % { $_.Name } | ? { $_ -match '^FC' }
Note that, for brevity, I've used built-in alias % in lieu of ForEach-Object and ? in lieu of Where-Object.
Also note that the regex passed to -match was changed to ^FC to ensure that only columns that start with FC are matched.
Your code works as-is in PSv3+, but can be simplified:
$FCcols = $myCSV[0].psobject.Properties.Name -match "^FC"
Note how .Name is applied directly to .psobject.Properties, which in v3+ causes the .Name member to be invoked on each item of the collection, a feature called member-access enumeration.
I would use Get-Member to get your columns, something like this:
$myCSV = Import-CSV "mydata.txt" -Delimiter "`t"
$myCSV | select ($myCSV | gm -MemberType NoteProperty | ? {$_.Name -match 'FC'}).Name
Mathias's helpful comment is best way to go for selecting; simple and elegant - dind't know it was an option.
$myCSV | Select *FC*,ColumnIKnowTheNameOf
I believe you need to add Export-Csv to answer your last question. Here's another approach I'd already worked on that makes use of Get-Member and NoteProperty if you need to interrogate csv/similar objects in future.
$myCSV = Import-CSV "mydata.txt" -Delimiter "`t"
# you can get the headings by using Get-Member and Selecting NoteProperty members.
$FCcols = $myCSV |
Get-Member |
Where-Object {$_.MemberType -eq "NoteProperty" -and $_.Name -match "FC"} |
Select-Object -ExpandProperty Name
# you add names to this array.
$FCcols += "ColumnIKnowTheNameOf"
$myCSV | Select-Object $FCcols
# to get a tab-delimited file similar to the one you imported, use Export-Csv
$myCSV | Export-csv "myresults.txt" -Delimiter "`t" -NoTypeInformation
I finally came up with a "quick and dirty" solution which I'm disappointed to not have figured out earlier.
$myCSV = Import-CSV "mydata.txt" -Delimiter "`t" | select FC*
for ($i = 0; $i -lt $myCSV.count; $i++){
$writeline = ($myCSV[$i] | %{$_.PSObject.Properties | %{$_.Value}}) -join "`t"
ac "myresults.txt" $writeline -Encoding utf8}
The first line gives me the columns I want, then the for loop gets the value properties of each column and joins them as tabulated lines, finally each line is appended to a text file.
This may not be the pedagogically correct way to achieve the result, but it works so far.
Thanks to everyone for their input!
Related
Filtering Data By Property
I'm seraching for a solution to filter entries in a csv in Powershell My File looks like this Header1;Header2;Header3 Tom;15;15.12.2008 Anna;17; Tim;18;12.01.2007 My Code looks atm like this : $altdaten = Get-Content -Path $altdatenpf | Select-Object -skip 1 |` ConvertFrom-Csv ` -Delimiter ";"` -Header $categoriesCSV $neudaten = Get-Content -Path $neudatenpf | Select-Object -skip 1 |` ConvertFrom-Csv ` -Delimiter ";"` -Header $categoriesCSV $zdaten = foreach ($user in $neudaten) { Where-Object $user.Austrittsdatum -EQ '' } $zdaten | export-Csv -Path '.\Test\zwischendaten.csv' In this case i want delete all entrys that are like tim and Tom, they have entrys in header3 Thank you in advance
You can try something like this: $altdaten = Get-Content -Path .\pathToCsvFile.csv | ConvertFrom-Csv -Delimiter ';' $zdaten = $altdaten | Where-Object { $_.Header3 -eq $null } Your equality operator checks actually an empty string. However, in your CSV file, there is nothing. Hence, you need to check for $null. Alternatively, if you are not sure, you can use: $zdaten = $altdaten | Where-Object { [string]::IsNullOrEmpty($_.Header3) } This covers either option and also looks appealing (for me at least).
Getting only a repeating files from directory and subdirectories
I'm trying to do script for finding non-unique files. The script should take one .csv file with data: name of files, LastWriteTime and Length. Then I try to make another .csv based on that one, which will contain only those objects whose combination of Name+Length+LastWriteTime is NON-unique. I tried following script which uses $csvfile containing files list: $csvdata = Import-Csv -Path $csvfile -Delimiter '|' $csvdata | Group-Object -Property Name, LastWriteTime, Length | Where-Object -FilterScript { $_.Count -gt 1 } | Select-Object -ExpandProperty Group -Unique | Export-Csv $csvfile2 -Delimiter '|' -NoTypeInformation -Encoding Unicode $csvfile was created by: { Get-ChildItem -Path $mainFolderPath -Recurse -File | Sort-Object $sortMode | Select-Object Name, LastWriteTime, Length, Directory | Export-Csv $csvfile -Delimiter '|' -NoTypeInformation -Encoding Unicode } (Get-Content $csvfile) | ForEach-Object { $_ -replace '"' } | Out-File $csvfile -Encoding Unicode But somehow in another $csvfile2 there is only the one (first) non-unique record. Does anyone have an idea how to improve it so it can list all non-unique records?
You need to use -Property * -Unique to get a list of unique objects. However, you cannot use -Property and -ExpandProperty at the same time here, because you want the latter parameter to apply to the input objects ($_) and the former parameter to apply to an already expanded property of those input objects ($_.Group). Expand the property Group first, then select the unique objects: ... | Select-Object -ExpandProperty Group | Select-Object -Property * -Unique | ...
Export-Csv adding unwanted header double quotes
I have got a source CSV file (without a header, all columns delimited by a comma) which I am trying split out into separate CSV files based upon the value in the first column and using that column value as the output file name. Input file: S00000009,2016,M04 01/07/2016,0.00,0.00,0.00,0.00,0.00,0.00,750.00,0.00,0.00 S00000009,2016,M05 01/08/2016,0.00,0.00,0.00,0.00,0.00,0.00,600.00,0.00,0.00 S00000009,2016,M06 01/09/2016,0.00,0.00,0.00,0.00,0.00,0.00,600.00,0.00,0.00 S00000010,2015,W28 05/10/2015,2275.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00 S00000010,2015,W41 04/01/2016,0.00,0.00,0.00,0.00,0.00,0.00,568.75,0.00,0.00 S00000010,2015,W42 11/01/2016,0.00,0.00,0.00,0.00,0.00,0.00,568.75,0.00,0.00 S00000012,2015,W10 01/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00 S00000012,2015,W11 08/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00 S00000012,2015,W12 15/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00 My PowerShell script looks like this: Import-Csv INPUT_FILE.csv -Header service_id,year,period,cash_exp,cash_inc,cash_def,act_exp,act_inc,act_def,comm_exp,comm_inc,comm_def | Group-Object -Property "service_id" | Foreach-Object { $path = $_.Name + ".csv"; $_.group | Export-Csv -Path $path -NoTypeInformation } Output files: S00000009.csv: "service_id","year","period","cash_exp","cash_inc","cash_def","act_exp","act_inc","act_def","comm_exp","comm_inc","comm_def" "S00000009","2016","M04 01/07/2016","0.00","0.00","0.00","0.00","0.00","0.00","750.00","0.00","0.00" "S00000009","2016","M05 01/08/2016","0.00","0.00","0.00","0.00","0.00","0.00","600.00","0.00","0.00" "S00000009","2016","M06 01/09/2016","0.00","0.00","0.00","0.00","0.00","0.00","600.00","0.00","0.00" S00000010.csv: "service_id","year","period","cash_exp","cash_inc","cash_def","act_exp","act_inc","act_def","comm_exp","comm_inc","comm_def" "S00000010","2015","W28 05/10/2015","2275.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00" "S00000010","2015","W41 04/01/2016","0.00","0.00","0.00","0.00","0.00","0.00","568.75","0.00","0.00" "S00000010","2015","W42 11/01/2016","0.00","0.00","0.00","0.00","0.00","0.00","568.75","0.00","0.00" It is generating the new files using the header value in column 1 (service_id). There are 2 problems. The output CSV file contains a header row which I don't need. The columns are enclosed with double quotes which I don't need.
First of all the .csv file needs headers and the quote marks as a csv file structure. But if you don't want them then you can go on with a text file or... $temp = Import-Csv INPUT_FILE.csv -Header service_id,year,period,cash_exp,cash_inc,cash_def,act_exp,act_inc,act_def,comm_exp,comm_inc,comm_def | Group-Object -Property "service_id" | Foreach-Object { $path=$_.name+".csv" $temp0 = $_.group | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1 $temp1 = $temp0.replace("""","") $temp1 > $path } But this output is not a "real" csv file. Hope that helps.
For your particular scenario you could probably use a simpler approach. Read the input file as a plain text file, group the lines by splitting off the first field, then write the groups to output files named after the groups: Get-Content 'INPUT_FILE.csv' | Group-Object { $_.Split(',')[0] } | ForEach-Object { $_.Group | Set-Content ($_.Name + '.csv') }
Another solution, using no named headers but simply numbers (as they aren't wanted in output anyway) avoiding unneccessary temporary files. removing only field delimiting double quotes. Import-Csv INPUT_FILE.csv -Header (1..12) | Group-Object -Property "1" | Foreach-Object { ($_.Group | ConvertTo-Csv -NoType | Select-Object -Skip 1).Trim('"') -replace '","',',' | Set-Content -Path ("{0}.csv" -f $_.Name) }
Convert txt file to csv and ignore specific text Line
$userObj = [PSCustomObject]((Get-Content -Raw C:\Automation\sam.txt) -replace ':', '=' | ConvertFrom-StringData) $name = Get-Item C:\Automation\sam.txt | Select-Object -ExpandProperty BaseName $userObj | Export-Csv C:\Automation\$name.csv I am using the above script to convert txt file to CSV but I don't know how to exclude word Line with underlined characters . The script works if I delete "Line ---- " and try.
Skip the first 2 lines when importing the file: $data = (Get-Content 'C:\path\to\input.txt') -replace ':', '=' | Select-Object -Skip 2 | Out-String | ConvertFrom-StringData [PSCustomObject]$data | Export-Csv 'C:\path\to\output.csv' -NoType Better yet, change the process that creates your input file, so that it doesn't produce the offending header. The data seems to be generated by something like ... | Select-String ... | Select-Object Line | Set-Content ... To remove the header from the output you just need to change Select-Object Line to Select-Object -Expand Line.
How to write the header from a .csv file to an array using powershell?
How do I read only the head from a CSV file and write the columnn names into an array? I have found a solution using following cmdlets: $obj = Import-Csv '.\users.csv' -Delimiter ';' $headerarray = ($obj | Get-member -MemberType 'NoteProperty' | Select-Object -ExpandProperty 'Name') But the problem is the name - values are auto sorted alphabetic Anyone has a solution for this?
You can get the column names of a CSV file like this: import-csv <csvfilename> | select-object -first 1 | foreach-object { $_.PSObject.Properties } | select-object -expandproperty Name