Deleting columns from CSV using PowerShell - powershell

I have a CSV file that has duplicate column headers, so I can't use Import-Csv to do the work. The header names are dynamic. I need to get the third column, the fourth column, and every fourth column after that(ex: starting from 0 columns 2, 3, 7, 11, 15...).
The reason I have duplicate column names is that header 3 needed the same name as header 0, in groups of four. 0 > 3, 4 > 7, 8 > 11...
I used get-Content because I couldn't figure out how to make this work with Import-Csv. I had to use Import-Csv to get the number of columns, which I couldn't figure out with Get-Content.
#Rename every fourth column
$file = "C:\Scripts\File.csv"
$data = get-content $file
$step = 4
$csv = Import-Csv "C:\Scripts\File.csv"
$headers = $data | select -first 1
$count = $csv[0].PSObject.Properties | select -Expand Name
for ($i = 0; $i -lt $count.count; $i += $step)
{
$headers = $headers -split ","
$headers[($i + 3)] = $headers[$i]
$headers[($i + 2)] = "timestamp"
$headers = $headers -join ","
$data[0] = $headers
$data | Set-Content "C:\Scripts\File.csv"
}
I can reuse the variable $count if needed (for $count.count), so I don't have to use Import-Csv again. I'm having trouble figuring out how to get just the columns I need based on number and not header name.
This worked great for getting the third column (2nd if starting from 0), but I'm not sure how to get every fourth after that (3rd if starting from 0)
type "C:\Scripts\File.csv" | % { $_.Split(",") | select -skip 2 -first 1 }
Screenshots below. Keep in mind I do not know the headers names of every fourth column as they could be anything, I only know which column number the data is in (every fourth column).

I'd re-think that whole process and start with this:
$file = "C:\Scripts\File.csv"
$HeaderCount = ((gc sentlog.csv -TotalCount 1).split(',')).count -1
$CSV = import-csv $file -Header (0..$HeaderCount)
Now you can treat those column headings like array indexes to extract out the columns you want.
Use Select -Skip 1 to strip off the original header row. You can rewrite the property names for export using calculated properties or just create new objects, using property names extracted from the original header row.
OK, based on the posted data, try this:
$file = "C:\Scripts\File.csv"
$OutputFile = "C:\Scripts\OutputFile.csv"
$HeaderCount = ((Get-Content $file -TotalCount 1).split(',')).count -1
$CSV = import-csv $file -Header (0..$HeaderCount)
$SelectedColumns = #(2) + ( (0..$HeaderCount) |? { ($_ % 4) -eq 3 } ) -as [string[]]
$CSV |
select $SelectedColumns |
ConvertTo-CSV -NoTypeInformation |
Select -Skip 1 |
Set-Content $OutputFile

Related

How to set a variable to a column with no header in a tab delimited text file

Barcode1 Plate # 12/29/2017 07:35:56 EST
A 1 4 5 6
A 1 4 5 6
A 1 4 5 6
A 1 4 5 6
A 1 4 5 6
A 1 4 5 6
A 1 4 5 6
Above is an example of a tab delimited text file. I need to get the data from the column with no header; namely, the columns at the end and I don't know how to identify it. I am trying to swap columns and output a text file. The source data file format is the same every time.
This is part of what I have:
$swapColumns = #{
column1 = #{
name = "date-header"
instance = 1
}
column2 = #{
name = "Blank"
instance = 1
}
}
$formats = #(
'XR-{0:yyyyMMdd}-01.txt'
)
$date = [datetime]::now
$ErrorActionPreference = 'Stop'
function Get-HeaderIndex {
param(
[System.Collections.Generic.List[string]]$Source,
[string]$Header,
[uint16]$Instance
)
$index = 0;
for ($i = 0; $i -lt $Instance; $i++) {
$index = $Source.IndexOf($Header, $index, ($Source.Count - $index))
if (($index -eq -1) -or (($i + 1) -eq $Instance)) {
break
}
$index = $index + 1
}
if ($index -eq -1) { throw "index not found" }
return $index
}
#grabs the first item in folder matching UCX-*.txt
$fileDetails = Get-ChildItem $PSScriptRoot\UCX-*.txt | select -First 1
#gets the file contents
$file = Get-Content $fileDetails
#break up script in sections that look like '======section======'
#and store the section name and line number it starts on
$sections = #()
for ($i = 0; $i -lt $file.Count; $i++) {
if ($file[$i] -match '^=+(\w+)=+$') {
$section = $Matches[1]
$sections += [pscustomobject]#{line = $i; header = $section}
}
}
#get the data section
$dataSection = $sections | ? {$_.header -eq 'data'}
#get the section following data
$nextSection = $sections | ? {$_.line -gt $dataSection.line} | sort
-Property line | select -First 1
#get data column headers
$dataHeaders = New-Object System.Collections.Generic.List[string]
$file[$dataSection.line + 1].split("`t") | % {
[datetime]$headerDateValue = [datetime]::MinValue
$headerIsDate = [datetime]::TryParse($_.Replace('EST','').Trim(),
[ref] $headerDateValue)
if ($headerIsDate) {
$dataHeaders.Add('date-header')
}
else {
$dataHeaders.Add($_)
}
}
#get index of columns defined in $swapColumns
$column1 = Get-HeaderIndex -Source $dataHeaders -Header
$swapColumns.column1.name -Instance $swapColumns.column1.instance
$column2 = Get-HeaderIndex -Source $dataHeaders -Header
swapColumns.column2.name -Instance $swapColumns.column2.instance
#iterate over each row in data section, swap data from column1/column2
for ($i = $dataSection.line + 2; $i -lt $nextSection.line - 1; $i++) {
$line = $file[$i]
$parts = $line.split("`t")
$tmp1 = $parts[$column1]
$parts[$column1] = $parts[$column2]
$parts[$column2] = $tmp1
$file[$i] = $parts -join "`t"
}
#write new file contents to files with names defined in $formats
$formats | % {
$file | Out-File ($_ -f $date) -Force
}
If you know what your file format is going to be then forget whatever the current header is and assume when we convert the file to a CSV object.
It looks like you need to parse the date of out the header which should be trivial. Grab it from $fileheader however you would like.
$wholeFile = Get-Content C:\temp\test.txt
$fileHeader = $wholeFile[0] -split "`t"
$newHeader = "Barcode1", "Plate #", "Date", "Plumbus", "Dinglebop"
$wholeFile |Select-Object -Skip 1 | ConvertFrom-Csv -Delimiter "`t" -Header $newHeader
If the columns length is always the same, there's another option, specify manually the width of the columns, See example:
$content = Get-Content C:\temp.tsv
$columns = 13, 24, 35 | Sort -Descending
$Delimiter = ','
$Results = $content | % {
$line = $_
$columns | % {
$line = $line.Insert($_, $Delimiter)
}
$line
} |
ConvertFrom-Csv -Delimiter $Delimiter
Results:
Barcode1 Plate # H1 12/29/2017 07:35:56 EST
--------- ----------- -- -----------------------
A 1 4 5
A 1 4 5
A 1 4 5
A 1 4 5
A 1 4 5
A 1 4 5
A 1 4 5
Then you can easily get the data you need:
$Results[0].H1
4
[This answer doesn't solve the OP's problem after clarifying the exact requirements, but may be of general interest to some, given the question's generic title.]
If the file is really tab-delimited, you can use Import-Csv -Delimiter "`t" to read it, in which case PowerShell will autogenerate header names as H<n> if they're missing, where <n> is a sequence number starting with 1.
Caveat: This doesn't work if the unnamed column is the last one, because - inexplicably - Import-Csv then ignores the entire column (more generally, any run of trailing delimiters).
Import-Csv -Delimiter "`t" file.tsv | Select-Object -ExpandProperty H1

Powershell: How to merge unique headers from one CSV to another?

Edit 1:
So I've figure out how to get the unique headers in CSV 2 to append to CSV 1.
$header = ($table | Get-Member -MemberType NoteProperty).Name
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
$header_diff = $header + $header_add
$header_diff = ($header_diff | Sort-Object -Unique)
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)
$header is an array of headers from CSV 1 ($table). $header_add is an array of headers from CSV 2 ($table_add). $header_diff houses the unique headers in CSV 2 by the end of the code block.
So as far as I'm aware, my next step would be:
$append = ($table_add | Select-Object $header_diff)
My problem now is how do I append these objects to my CSV 1 ($table 1) object? I don't quite see a way for Add-Member to do this in a particularly nice fashion.
Original:
Here's the headers for the two CSV files I'm trying to combine.
CSV 1:
Date, Name, Assigned Router, City, Country, # of Calls , Calls in , Calls out
CSV 2:
Date, Name, Assigned Router, City, Country, # of Minutes, Minutes in, Minutes out
So a quick rundown of what these files are; both files contain call information for a set of names for one day (the date column has the same date for each row; this is because this eventually gets sent to a master .xlsx file with all dates combined). All of the columns up to Country contain the same values in the same order in both files. The files simply separate the # of calls and # of minutes data. I was wondering if there was a convenient way to move the unlike columns from one CSV to another.
I've tried using something along the lines of:
Import-Csv (Get-ChildItem <directory> -Include <common pattern in file pair>) | Export-Csv <output path> -NoTypeInformation
This didn't combine all of the matching headers and append the unique ones afterwards. Only the first file that's processed kept its unique headers. The second file that was processed had all of those headers and data discarded in the output. Shared header data in the second CSV was added as additional rows.
An example output of my described fail output:
PS > $small | Format-Table
Column_1 Column_2 Column_3
-------- -------- --------
1 a a
1 b b
1 c c
PS > $small_add | Format-Table
Column_1 Column_4 Column_5
-------- -------- --------
1 x x
1 y y
1 z z
PS > Import-Csv (Get-ChildItem ./*.* -Include "small*.csv") | Select-Object * -unique | Format-Table
Column_1 Column_2 Column_3
-------- -------- --------
1 a a
1 b b
1 c c
1
1
1
I was wondering if I could do something like the following algorithm:
Import-Csv CSV_1 and CSV_2 to separate variables
Compare CSV_2 headers to CSV_1 headers, storing the unlike headers in CSV_2 into a separate variable
Select-Object all CSV_1 headers and unlike CSV_2 headers
Pipe the Select-Object output to Export-Csv
The only other method I could only think of is doing it line by line where I would:
Import-Csv both
remove all of the shared columns from CSV_2
change it from the custom object Powershell uses for CSVs to a string
append each line of CSV_2 to each line of CSV_1
It feels a bit unrefined and inflexible (flexibility can probably be dealt with by how columns/headers are isolated so there's no problem appending strings).
* This answer focuses on a high-level-of-abstraction OO solution.
* The OP's own solution relies more on string processing, which has the potential to be faster.
# The input file paths.
$files = 'csv1.csv', 'csv2.csv'
$outFile = 'csvMerged.csv'
# Read the 2 CSV files into collections of custom objects.
# Note: This reads the entire files into memory.
$doc1 = Import-Csv $files[0]
$doc2 = Import-Csv $files[1]
# Determine the column (property) names that are unique to document 2.
$doc2OnlyColNames = (
Compare-Object $doc1[0].psobject.properties.name $doc2[0].psobject.properties.name |
Where-Object SideIndicator -eq '=>'
).InputObject
# Initialize an ordered hashtable that will be used to temporarily store
# each document 2 row's unique values as key-value pairs, so that they
# can be appended as properties to each document-1 row.
$htUniqueRowD2Props = [ordered] #{}
# Process the corresponding rows one by one, construct a merged output object
# for each, and export the merged objects to a new CSV file.
$i = 0
$(foreach($rowD1 in $doc1) {
# Get the corresponding row from document 2.
$rowD2 = $doc2[$i++]
# Extract the values from the unique document-2 columns and store them in the ordered
# hashtable.
foreach($pname in $doc2OnlyColNames) { $htUniqueRowD2Props.$pname = $rowD2.$pname }
# Add the properties represented by the hashtable entries to the
# document-1 row at hand and output the augmented object (-PassThru).
$rowD1 | Add-Member -NotePropertyMembers $htUniqueRowD2Props -PassThru
}) | Export-Csv -NoTypeInformation -Encoding Utf8 $outFile
To put the above to the test, you can use the following sample input:
# Create sample input CSV files
#'
Date,Name,Assigned Router,City,Country,# of Calls,Calls in,Calls out
dt,nm,ar,ct,cy,cc,ci,co
dt2,nm2,ar2,ct2,cy2,cc2,ci2,co2
'# > csv1.csv
# Same column layout and data as above through column 'Country', then different.
#'
Date,Name,Assigned Router,City,Country,# of Minutes,Minutes in,Minutes out
dt,nm,ar,ct,cy,mc,mi,mo
dt2,nm2,ar2,ct2,cy2,mc2,mi2,mo2
'# > csv2.csv
The code should produce the following content in csvMerged.csv:
"Date","Name","Assigned Router","City","Country","# of Calls","Calls in","Calls out","# of Minutes","Minutes in","Minutes out"
"dt","nm","ar","ct","cy","cc","ci","co","mc","mi","mo"
"dt2","nm2","ar2","ct2","cy2","cc2","ci2","co2","mc2","mi2","mo2"
Edit 1:
# Read 2 CSVs into PowerShell CSV object
$table = Import-Csv test.csv
$table_add = Import-Csv test_add.csv
# Isolate unique headers in second CSV
$unique_headers = (Compare-Object -ReferenceObject $table[0].PSObject.Properties.Name -DifferenceObject $table_add[0].PSObject.Properties.Name | Where-Object SideIndicator -eq "=>").InputObject
# Convert CSVs to strings, with second CSV only containing unique columns
$table_str = ($table | ConvertTo-Csv -NoTypeInformation)
$table_add_str = ($table_add | Select-Object $unique_headers | ConvertTo-Csv -NoTypeInformation)
# Append CSV 2's unique columns to CSV 1
# Set line counter
$line = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines
While (($table_str[$line] -ne $null) -and ($table_add_str[$line] -ne $null)) {
If ($line -eq 0) {
$table_sum_str = $table_str[$line] + "," + $table_add_str[$line]
}
If ($line -ne 0) {
$table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_add_str[$line])
}
$line = $line + 1
}
$table_sum_str | Set-Content -Path $outpath -Encoding UTF8
Using Measure-Command, the above code on my machine for the most part takes anywhere between 14-17 milliseconds to run. Running Measure-Command on mklement's yields effectively the same times from just eyeballing it.
Note that for both solutions, the data in the 2 CSV files must be in the same order. If you want to add 2 CSVs together that have complimentary data but in different orders, you need to use mklement's object oriented approach and add mechanisms to match the data to a location or name.
Original:
For those who don't want to use a hash table to do this:
# Make sure you're in same directory as files:
# CSV 1
$table = Import-Csv test.csv
# CSV 2
$table_add = Import-Csv test_add.csv
# Get array with CSV 1 headers
$header = ($table | Get-Member -MemberType NoteProperty).Name
# Get array with CSV 2 headers
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
# Add arrays of both headers together
$header_diff = $header + $header_add
# Sort the headers, remove duplicate headers (first couple ones), keep unique ones
$header_diff = ($header_diff | Sort-Object -Unique)
# Remove all of CSV 1's unique headers and shared headers
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)
# Generate a CSV table containing only CSV 2's unique headers
$table_diff = ($table_add | Select-Object $header_diff)
# Convert CSV 1 from a custom PSObject to a string
$table_str = ($table | Select-Object * | ConvertTo-Csv)
# Convert CSV 2 (unique headers only) from custom PSObject to a string
$table_diff_str = ($table_diff | Select-Object * | ConvertTo-Csv)
# Set line counter
$line = 0
# Set flag for if headers have been processed
$headproc = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines.
While (($table_str[$line] -ne $null) -and ($table_diff_str[$line] -ne $null)) {
If ($headproc -eq 1) {
$table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_diff_str[$line])
}
If ($headproc -eq 0) {
$table_sum_str = $table_str[$line] + "," + $table_diff_str[$line]
$headproc = 1
}
$line = $line + 1
}
$table_sum_str | ConvertFrom-Csv | Select-Object * | Export-Csv -Path "./test_sum.csv" -Encoding UTF8 -NoTypeInformation
Ran a quick comparison using Measure-Command between this and mklement0's script.
PS > Measure-Command {./self.ps1}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 26
Ticks : 267771
TotalDays : 3.09920138888889E-07
TotalHours : 7.43808333333333E-06
TotalMinutes : 0.000446285
TotalSeconds : 0.0267771
TotalMilliseconds : 26.7771
PS > Measure-Command {./mklement.ps1}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 18
Ticks : 185058
TotalDays : 2.141875E-07
TotalHours : 5.1405E-06
TotalMinutes : 0.00030843
TotalSeconds : 0.0185058
TotalMilliseconds : 18.5058
I assume speed differences are because I spend time creating a separate CSV PSObject to isolate columns instead of comparing them directly. mklement's also has the advantage of keeping the columns in the same order.

Powershell removing columns and rows from CSV

I'm having trouble making some changes to a series of CSV files, all with the same data structure. I'm trying to combine all of the files into one CSV file or one tab delimited text file (don't really mind), however each file needs to have 2 empty rows removed and two of the columns removed, below is an example:
col1,col2,col3,col4,col5,col6 <-remove
col1,col2,col3,col4,col5,col6 <-remove
col1,col2,col3,col4,col5,col6
col1,col2,col3,col4,col5,col6
^ ^
remove remove
End Result:
col1,col2,col4,col6
col1,col2,col4,col6
This is my attempt at doing this (I'm very new to Powershell)
$ListofFiles = "example.csv" #this is an list of all the CSV files
ForEach ($file in $ListofFiles)
{
$content = Get-Content ($file)
$content = $content[2..($content.Count)]
$contentArray = #()
[string[]]$contentArray = $content -split ","
$content = $content[0..2 + 4 + 6]
Add-Content '...\output.txt' $content
}
Where am I going wrong here...
your example file should be read, before foreach to fetch the file list
$ListofFiles = get-content "example.csv"
Inside the foreach you are getting content of mainfile
$content = Get-Content ($ListofFiles)
instead of
$content = Get-Content $file
and for removing rows i will recommend this:
$obj = get-content C:\t.csv | select -Index 0,1,3
for removing columns (column numbers 0,1,3,5):
$obj | %{(($_.split(","))[0,1,3,5]) -join "," } | out-file test.csv -Append
According to the fact the initial files looks like
col1,col2,col3,col4,col5,col6
col1,col2,col3,col4,col5,col6
,,,,,
,,,,,
You can also try this one liner
Import-Csv D:\temp\*.csv -Header 'C1','C2','C3','C4','C5','C6' | where {$_.c1 -ne ''} | select -Property 'C1','C2','C5' | Export-Csv 'd:\temp\final.csv' -NoTypeInformation
According to the fact that you CSVs have all the same structure, you can directly open them providing the header, then remove objects with the missing datas then export all the object in a csv file.
It is sufficient to specify fictitious column names, with a column number that can exceed the number of columns in the file, change where you want and exclude columns that you do not want to take.
gci "c:\yourdirwithcsv" -file -filter *.csv |
%{ Import-Csv $_.FullName -Header C1,C2,C3,C4,C5,C6 |
where C1 -ne '' |
select -ExcludeProperty C3, C4 |
export-csv "c:\temp\merged.csv" -NoTypeInformation
}

powershell concat all columns in a row

i have 20+ columns in a csv file like
empid ename deptid mgrid hiredon col6 .... col20
10 a 10 5 10-may-2010
11 b 10 5 08-aug-2005
12 c 11 3 11-dec-2008
i would like to get the output as csv like
empid, all_other_details
10 , {ename:a;deptid:10;mgrid:5; like this for all 19 columns }
except employee id all other columns should be wrapped into a string containing key:value pairs. Is there a way to join all the columns without mentioning each column as $_. ?
I have come up with this, I hope comments are self explanatory.
It should work with 2 or more columns.
Delimiters can be changed (on my computer, CSV delimiter is ; not , for example, and I know it can be different with other Cultures).
#declare delimiters
$CSVdelimiter = ";"
$detailsDelimiter = ","
#load file in array
$data = Get-Content "Book1.csv"
#isolate headers
$headers = $data[0].Split($CSVdelimiter)
#declare row counter
$rowCount = 0
#declare results array with headers
$results = #($headers[0] + "$CSVdelimiter`details")
#for each row except first
$data | Select-Object -Skip 1 | % {
#split on $csvDelimiter
$rowArray = $_.Split($CSVdelimiter)
#declare details array
$details = #()
#for each column except first
for($i = 1; $i -lt $rowArray.Count; $i++) {
#add to details array (header:value)
$details += $headers[$i] + ":" + $rowArray[$i]
}
#join details array with $detailsDelimiter to build new row
#append to first column value
#add to results array
$results += "$($rowArray[0])$CSVdelimiter{$($details -join $detailsDelimiter)}"
#increment row counter
$rowCount++
}
#output results to new csv file
$results | Out-File "Book2.csv"
Output looks like this :
empid;details
10;{ename:a,deptid:10,mgrid:5,hiredon:10-may-2010}
11;{ename:b,deptid:10,mgrid:5,hiredon:08-aug-2005}
12;{ename:c,deptid:11,mgrid:3,hiredon:11-dec-2008}
Try this:
$csv = Get-Content .\input_file.csv
$keys = $csv[0] -split '\s+'
$c = $keys.count - 1
$keys = ($keys[1..$c] | % {$i = -1}{$i += 1; "$($_):{$i}"}) -join '; '
$csv[1..($csv.count -1)] | % {
$a = $_ -split '\s+'
New-Object psobject -Property #{
empid = $a[0]
all_other_details = "{$($keys -f $a[1..$c])}"
}
} | Export-Csv output_file.csv -NoTypeInformation

Loop through csv compare content with an array and then add content to csv

I don't know how to append a string to CSV. What am I doing:
I have two csv files. One with a list of host-names and id's and another one with a list of host-names and some numbers.
Example file 1:
Hostname | ID
IWBW140004 | 3673234
IWBW130023 | 2335934
IWBW120065 | 1350213
Example file 2:
ServiceCode | Hostname | ID
4 | IWBW120065 |
4 | IWBW140004 |
4 | IWBW130023 |
Now I read the content of file 1 in a two dimensional array:
$pcMatrix = #(,#())
Import-Csv $outputFile |ForEach-Object {
foreach($property in $_.PSObject.Properties){
$pcMatrix += ,($property.Value.Split(";")[1],$property.Value.Split(";")[2])
}
}
Then I read the content of file 2 and compare it with my array:
Import-Csv $Group".csv" | ForEach-Object {
foreach($property in $_.PSObject.Properties){
for($i = 0; $i -lt $pcMatrix.Length; $i++){
if($pcMatrix[$i][0] -eq $property.Value.Split('"')[1]){
#Add-Content here
}
}
}
}
What do I need to do, to append $pcMatrix[$i][1] to the active column in file 2 in the row ID?
Thanks for your suggestions.
Yanick
It seems like you are over-complicating this task.
If I understand you correctly, you want to populate the ID column in file two, with the ID that corresponds to the correct hostname from file 1. The easiest way to do that, is to fill all the values from the first file into a HashTable and use that to lookup the ID for each row in the second file:
# Read the first file and populate the HashTable:
$File1 = Import-Csv .\file1.txt -Delimiter "|"
$LookupTable = #{}
$File1 |ForEach-Object {
$LookupTable[$_.Hostname] = $_.ID
}
# Now read the second file and update the ID values:
$File2 = Import-Csv .\file2.txt -Delimiter "|"
$File2 |ForEach-Object {
$_.ID = $LookupTable[$_.Hostname]
}
# Then write the updated rows back to a new CSV file:
$File2 | Export-CSV -Path .\file3.txt -NoTypeInformation -Delimiter "|"