How to detect and remove CSV columns based on common text in header names?

How to detect and remove CSV columns based on common text in header names? - powershell

I am working on a CSV File which I recently created. The CSV file contains columns with headers and corresponding rows.
I need to remove entire columns (including its data) that have specific text common to their headers. For e.g column 1 has header named intID, column 2 has header named boolID, column 3 has header named charID and so on ('ID' being the common text). There are some columns that don't have 'ID' as text in their headers, so we need to retain those.
The csv file is getting generated dynamically, so there may be more/less columns based on what data we select for the csv. But we need these columns with their headers having some common text to be removed.
How can we achieve this?

Would something like that do the trick?
$yourfile = "<path to your csv>"
# Import the CSV
$csv = Import-Csv -Path $yourfile
# Find all columns that do not end with "ID"
$colsToKeep = $csv | Get-Member -MemberType NoteProperty |?{$_.name -notmatch "^.+ID$"} | Select-Object -ExpandProperty name
# Filter out all unwanted columns
$newCsv = $csv | Select-Object -Property $colsToKeep
# Export CSV to new file
$newCsv | Export-Csv -Path "<path to new csv>"

Assuming the following:
the ID part is not a plain text "ID" but a dynamic arbitrary text
headers of interest start with int, char, bool
Let's count occurrences of ID part and build a list of headers used just once, then export the CSV.
$csv = Import-Csv 1.csv
$prefix = '^(int|char|bool)' # or '^([a-z])' for any lowercase text
$headers = $csv[0].PSObject.Properties.Name
$uniqueIDs = $headers -creplace $prefix, '' | group | ? Count -eq 1 | select -expand Name
$uniqueHeaders = $headers | ?{ $_ -creplace $prefix, '' -in $uniqueIDs }
$csv | select $uniqueHeaders | Export-Csv 2.csv -NoTypeInformation
Note: in the old PowerShell 2.0 instead of ? Count -eq 1 use ?{ $_.Count -eq 1 }

Related

Powershell - Using ConvertFrom-csv

I'm brand new to Powershell. I have a variable that contains comma separated values. What I want to do is read each entry in the csv string variable, and assign it to a variable. I am using ConvertFrom-csv to separate the data with headers.
How can I assign each value to a variable, or even better, use ConvertTo-csv to create a new csv string which only has, for example, columns 2/3/6/7 in it?
I would ultimately want to write that data out to a new csv file.
Here is my test code:
#Setup the variable
$Data = "test1,test2,test3,1234,5678,1/1/2021,12/31/2021"
$Data | ConvertFrom-csv -Header Header1,Header2, Header3, Header4, Header5, Header6, Header7
# Verify that an object has been created.
$Data |
ConvertFrom-csv -Header Header1,Header2, Header3, Header4, Header5, Header6, Header7 |
Get-Member
#Show header1
Write-Host "--------Value from $Data----------------------------------------"
$Data[0] #doesn't work, only displays the first character of the string
Write-Host "-----------------------------------------------------------------"

Let me suggest a different approach. If you use ConvertFrom-Csv and assign the result of a variable ($data), this will be an array of Custom Objects. You can run this through a loop that steps through the elements of the array , one at a time, and then through an inner loop that steps through the properties of each object one at a time, setting a variable with the same name as the field header and the same value as the current record's value.
I don't have code that does exactly what you want. But I'm including code that I wrote a few years back that does something similar only using Import-Csv instead of ConverFrom-Csv.
Import-Csv $driver | % {
$_.psobject.properties | % {Set-variable -name $_.name -value $_.value}
Get-Content $template | % {$ExecutionContext.InvokeCommand.ExpandString($_)}
}
Focus on the first inner loop. Each property of the current object will have a name that came from the header and a value that came from the current record of the Csv file. You can ignore the line that says ExpandString. That's just what I choose to do with the variables once they have been defined.

How can I assign each value to a variable, or even better, use ConvertTo-Csv to create a new csv string which only has, for example, columns 2/3/6/7 in it?
This is one way of automating this:
# Define the CSV without headers
$Data = "test1,test2,test3,1234,5678,1/1/2021,12/31/2021"
# Set the number of headers needed
$headers = $Data.Split(',') | ForEach-Object -Begin { $i = 1 } -Process {
"Header$i"; $i++
}
# Set the desired columns we want
$desiredColumns = 2,3,6,7 | ForEach-Object { $_ - 1 } | ForEach-Object {
$headers[$_]
}
# Convert to CSV and filter by Desired Columns
$Data | ConvertFrom-Csv -Header $headers | Select-Object $desiredColumns
Result
Header2 Header3 Header6 Header7
------- ------- ------- -------
test2 test3 1/1/2021 12/31/2021
Result as CSV
$Data | ConvertFrom-Csv -Header $headers |
Select-Object $desiredColumns | ConvertTo-Csv -NoTypeInformation
"Header2","Header3","Header6","Header7"
"test2","test3","1/1/2021","12/31/2021"

How to export data into a specific column in a csv file in PowerShell

What I am trying to do is I import data from a csv file which has UserPrincipalnames and I am taking the names before the # symbol and then I want to export that data to a specific column in the same CSV file which in this case is o365Users.csv. I am able to write it out to a text file but I need to know how to export it out to Column G with the header name as SAM
This is my code:
$Addys = Import-Csv "C:\scripts\o365Users.csv"
$UPNs = $Addys.UserPrincipalName
foreach ($UPN in $UPNs) {
$Name = $UPN.Split("#")[0]
Write-Output $Name >> c:\scripts\o365Names.txt
}

To append a new column with the header SAM use Select-Object with a calculated property:
(Import-Csv 'C:\scripts\o365Users.csv') |
Select-Object -Property *,#{n='SAM';e={$_.UserPrincipalName.Split('#')[0]}}
If the new property has to be in a specific position you can't use the wildcard * but will have to enumerate all headers/columns/properties in the desired order, i.e.
(Import-Csv 'C:\scripts\o365Users.csv') |
Select-Object -Property ColA,ColB,ColC,ColD,ColE,ColF,#{n='SAM';e={$_.UserPrincipalName.Split('#')[0]}},ColH
replace Col_ with your real headers.
Due to enclosing the (Import-Csv) in parentheses you can export to the same file name (not recommended while still testing) - simply append
| Export-Csv 'C:\scripts\o365Users.csv' -NoTypeInformation

Here is a quick way to get just the output you are looking for. You would import the current CSV. Create an blank output array and in your loop add each name. Then export the CSV
$Addys = Import-Csv "C:\scripts\o365Users.csv"
$UPNs = $Addys.UserPrincipalName
[System.Collections.ArrayList]$Output = #()
foreach ($UPN in $UPNs) {
$Name = $UPN.Split("#")[0]
$Output.Add($Name) | Out-Null
}
$Output | Export-Csv -Path "C:\scripts\o365Users.csv" -NoTypeInformation

Select specific column based on data supplied using Powershell

I have a csv file that may have unknown headers, one of the columns will contain email addresses for example.
Is there a way to select only the column that contains the email addresses and save it as a list to a variable?
One csv could have the header say email, another could say emailaddresses, another could say email addresses another file might not even have the word email in the header. As you can see, the headers are different. So I want to be able to detect the correct column first and use that data further in the script. Once the column is identified based on the data it contains, select that column only.
I've tried the where-object and select-string cmdlets. With both, the output is the entire array and not just the data in the column I am wanting.
$CSV = import-csv file.csv
$CSV | Where {$_ -like "*#domain.com"}
This outputs the entire array as all rows will contain this data.

Sample Data for visualization
id,first_name,bagel,last_name
1,Base,bcruikshank0#homestead.com,Cruikshank
2,Regan,rbriamo1#ebay.co.uk,Briamo
3,Ryley,rsacase2#mysql.com,Sacase
4,Siobhan,sdonnett3#is.gd,Donnett
5,Patty,pesmonde4#diigo.com,Esmonde
Bagel is obviously what we are trying to find. And we will play pretend in that we have no knowledge of the columns name or position ahead of time.
Find column dynamically
# Import the CSV
$data = Import-CSV $path
# Take the first row and get its columns
$columns = $data[0].psobject.properties.name
# Cycle the columns to find the one that has an email address for a row value
# Use a VERY crude regex to validate an email address.
$emailColumn = $columns | Where-Object{$data[0].$_ -match ".*#*.\..*"}
# Example of using the found column(s) to display data.
$data | Select-Object $emailColumn
Basically read in the CSV like normal and use the first columns data to try and figure out where the email address column is. There is a caveat that if there is more than one column that matches it will get returned.
To enforce only 1 result a simple pipe to Select-Object -First 1 will handle that. Then you just have to hope the first one is the "right" one.

If you're using Import-Csv, the result is a PSCustomObject.
$CsvObject = Import-Csv -Path 'C:\Temp\Example.csv'
$Header = ($CsvObject | Get-Member | Where-Object { $_.Name -like '*email*' }).Name
$CsvObject.$Header
This filters for the header containing email, then selects that column from the object.
Edit for requirement:
$Str = #((Get-Content -Path 'C:\Temp\Example.csv') -like '*#domain.com*')
$Headers = #((Get-Content -Path 'C:\Temp\Example.csv' -TotalCount 1) -split ',')
$Str | ConvertFrom-Csv -Delimiter ',' -Header $Headers

Other method:
$PathFile="c:\temp\test.csv"
$columnName=$null
$content=Get-Content $PathFile
foreach ($item in $content)
{
$SplitRow= $item -split ','
$Cpt=0..($SplitRow.Count - 1) | where {$SplitRow[$_] -match ".*#*.\..*"} | select -first 1
if ($Cpt)
{
$columnName=($content[0] -split ',')[$Cpt]
break
}
}
if ($columnName)
{
import-csv "c:\temp\test.csv" | select $columnName
}
else
{
"No Email column founded"
}

Powershell removing columns and rows from CSV

I'm having trouble making some changes to a series of CSV files, all with the same data structure. I'm trying to combine all of the files into one CSV file or one tab delimited text file (don't really mind), however each file needs to have 2 empty rows removed and two of the columns removed, below is an example:
col1,col2,col3,col4,col5,col6 <-remove
col1,col2,col3,col4,col5,col6 <-remove
col1,col2,col3,col4,col5,col6
col1,col2,col3,col4,col5,col6
^ ^
remove remove
End Result:
col1,col2,col4,col6
col1,col2,col4,col6
This is my attempt at doing this (I'm very new to Powershell)
$ListofFiles = "example.csv" #this is an list of all the CSV files
ForEach ($file in $ListofFiles)
{
$content = Get-Content ($file)
$content = $content[2..($content.Count)]
$contentArray = #()
[string[]]$contentArray = $content -split ","
$content = $content[0..2 + 4 + 6]
Add-Content '...\output.txt' $content
}
Where am I going wrong here...

your example file should be read, before foreach to fetch the file list
$ListofFiles = get-content "example.csv"
Inside the foreach you are getting content of mainfile
$content = Get-Content ($ListofFiles)
instead of
$content = Get-Content $file
and for removing rows i will recommend this:
$obj = get-content C:\t.csv | select -Index 0,1,3
for removing columns (column numbers 0,1,3,5):
$obj | %{(($_.split(","))[0,1,3,5]) -join "," } | out-file test.csv -Append

According to the fact the initial files looks like
col1,col2,col3,col4,col5,col6
col1,col2,col3,col4,col5,col6
,,,,,
,,,,,
You can also try this one liner
Import-Csv D:\temp\*.csv -Header 'C1','C2','C3','C4','C5','C6' | where {$_.c1 -ne ''} | select -Property 'C1','C2','C5' | Export-Csv 'd:\temp\final.csv' -NoTypeInformation
According to the fact that you CSVs have all the same structure, you can directly open them providing the header, then remove objects with the missing datas then export all the object in a csv file.

It is sufficient to specify fictitious column names, with a column number that can exceed the number of columns in the file, change where you want and exclude columns that you do not want to take.
gci "c:\yourdirwithcsv" -file -filter *.csv |
%{ Import-Csv $_.FullName -Header C1,C2,C3,C4,C5,C6 |
where C1 -ne '' |
select -ExcludeProperty C3, C4 |
export-csv "c:\temp\merged.csv" -NoTypeInformation
}

Use Import-Csv to read changable column Titles by location

I'm trying to see if there is a way to read the column values in a csv file based on the column location. The reason for this is the file I'm being handed always has it's titles being changed...
For example, lets say csv file column A (via excel) looks like the following:
ColumnOne
ValueOne
ValueTwo
ValueThree
Now the user changes the title:
Column 1
ValueOne
ValueTwo
ValueThree
Now I want to create an array of the first column. Normally what I do is the following:
$arrayFirstColumn = Import-Csv 'C:\test\test1.csv' | where-object {$_.ColumnOne} | select-object -expand 'ColumnOne'
However, as we can see if ColumnOne is changed to Column 1, it breaks this code. How can I create this array to allow an interchangeable column title, but the column location will always be the same?

You can specify headers of your own on import:
Import-Csv 'C:\path\to\your.csv' -Header 'MyHeaderA','MyHeaderB',...
As long as you don't export the data back to a CSV (or don't require the original headers to be in the output CSV as well) you can use whatever names you like. You can also specify as many header names as you like. If their number is less than the number of the columns in the CSV the additional columns will be omitted, if it's greater then the columns for the additional headers will be empty.
If you need to preserve the original headers you could get the header name(s) you need to work with in variable(s) like this:
$csv = Import-Csv 'C:\test\test1.csv'
$firstCol = $csv | Select-Object -First 1 | ForEach-Object {
$_.PSObject.Properties | Select-Object -First 1 -Expand Name
}
$arrayFirstColumn = $csv | Where-Object {$_.$firstCol} |
Select-Object -Expand $firstCol
Or you could simply read the first line from the CSV and split it to get an array with the headers:
$headers = (Get-Content 'C:\test\test1.csv' -TotalCount 1) -split ','
$firstCol = $headers[0]

One option:
$ImportFile = 'C:\test\test1.csv'
$FirstColumn = ((Get-Content $ImportFile -TotalCount 2 | ConvertFrom-Csv).psobject.properties.name)[0]
$FirstColumn
$arrayFirstColumn = Import-Csv $ImportFile | where-object {$_.$FirstColumn} | select-object -expand $FirstColumn

If you are using PowerShell v2.0 then the expression for $FirstColumn in $mjolinor's answer would be:
$FirstColumn = ((Get-Content $ImportFile -TotalCount 2 | ConvertFrom-Csv).psobject.properties | ForEach-Object {$_.name})[0]
(Apologies for starting a new answer; I do not yet have enough reputation to add a comment to mjolinor's post)

Categories

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to detect and remove CSV columns based on common text in header names? - powershell

Related

Powershell - Using ConvertFrom-csv

How to export data into a specific column in a csv file in PowerShell

Select specific column based on data supplied using Powershell

Powershell removing columns and rows from CSV

Use Import-Csv to read changable column Titles by location

Categories

Resources