I have a series of files that have changed some header naming and column counts over time. However, the files always have the first column as the start date and second column as the end date.
I would like to get just these two columns, but the name has changed over time.
What I have tried is this:
$FileContents=Import-CSV -Path "$InputFilePath"
foreach ($line in $FileContents)
{
$StartDate=$line[0]
$EndDate=$line[1]
}
...but $FileContents is (I believe) an array of a type (objects?) that I'm not sure how to positionally access in PowerShell. Any help would be appreciated.
Edit: The files switched from comma delimiter to pipe delimiter a while back and there are 1000s of files to work with, so I use Import-CSV because it can implicitly read either format.
You could use the -Header parameter to give the first to columns of the csv the header names you want. Then you'll skip the first line that has the old header.
$FileContents = Import-CSV -Path "$InputFilePath" -Header "StartDate","EndDate" | Select-Object "StartDate","EndDate" -Skip 1
foreach ($line in $FileContents) {
$StartDate = $line.StartDate
$EndDate = $line.EndDate
}
Here's an example:
Example.csv
a,b,c
1,2,3
4,5,6
Import-CSV -Path Example.csv -Header "StartDate","EndDate" | Select-Object "StartDate","EndDate" -Skip 1
StartDate EndDate
--------- -------
1 2
4 5
If you use Import-Csv, PowerShell will indeed create an object for you. The "columns" are calles properties. You can select properties with Select-Object. You have to name the properties, you want to select. Since you don't know the property names in advance, you can get the names with Get-Member. The first two properties should match the first two columns in your CSV.
Use the following sample code and apply it to your script:
$csv = #'
1,2,3,4,5
a,b,c,d,e
g,h,i,j,k
'#
$csv = $csv | ConvertFrom-Csv
$properties = $csv | Get-Member -MemberType NoteProperty | Select-Object -First 2 -ExpandProperty Name
$csv | Select-Object -Property $properties
How about this:
$FileContents=get-content -Path "$InputFilePath"
for ($i=0;$i -lt $FileContents.count;$i++){
$textrow = ($FileContents[$i]).split(",")
$StartDate=$textrow[0]
$EndDate=$textrow[1]
#do what you want with the variables
write-host $startdate
write-host $EndDate
}
pending you are referencing a csv file....
Other solution with foreach (%=alias of foreach) and split :
Get-Content "example.csv" | select -skip 1 | %{$row=$_ -split ',', 3; [pscustomobject]#{NewCol1=$row[0];NewCol2=$row[1]}}
You can build predicate into the select too like this :
Get-Content "example.csv" | select #{N="Newcol1";E={($_ -split ',', 3)[0]}}, #{N="Newcol2";E={($_ -split ',', 3)[1]}} -skip 1
With convertfrom-string
Get-Content "example.csv" | ConvertFrom-Csv -Delimiter ',' -Header col1, col2 | select -skip 1
Related
I'm trying to import some csv files to further work with them and export them in the end. They all have two header lines from which i'll only need the second one. I also need to delete most columns except a few. Unfortunately it seems you'll need to decide if you want to skip rows with get-content or exclude columns with import-csv. Neither of those can't do both, so i got a workaround:
$out="bla\bla\out.csv"
$in="bla\bla\in.csv"
$header= (get-content $in -TotalCount 2 )[-1]
$out = Import-csv $in -Header $header -Delimiter ";"|select column1 | Export-Csv -Path $out -NoTypeInformation
this returns an empty csv with the header name column1. What am i doing wrong?
Edit:
The input csv looks like:
filename;filename;...
column1;column2;...
1;a;...
2;b;...
...
I guess that -Header can't read arrays without single quotation marks, so i'm trying to find a solution to that atm.
If you know the name of the header you want to filter on, the following should do the trick and only requires reading the file once:
$out = "out.csv"
$in = "in.csv"
Get-Content $in | Select-Object -Skip 1 |
ConvertFrom-Csv -Delimiter ';' | Select-Object column1 |
Export-Csv $out -NoTypeInformation
If however, you don't know the name of the header you need to filter on (column1 on example above) but you know it's the first column, it would require an extra step:
$csv = Get-Content $in | Select-Object -Skip 1 | ConvertFrom-Csv -Delimiter ';'
$csv | Select-Object $csv[0].PSObject.Properties.Name[0] | Export-Csv $out -NoTypeInformation
We can get the first object of object array ($csv[0]) and get it's properties by accessing it's PSObject.Properties then select the 1st property (.Name[0] - column1 in this case).
I have two CSV files like this:
CSV1:
Name
test;
test & example;
test & example & testagain;
CSV2:
Name
test1;
test&example;
test & example&testagain;
I want to compare each line of CSV1 with each line of CSV2 and, if the first 5 letters match, write the result.
I'm able to compare them but only if match perfectly:
$CSV1 = Import-Csv -Path ".\client.csv" -Delimiter ";"
$CSV2 = Import-Csv ".\client1.csv" -Delimiter ";"
foreach ($record in $CSV1) {
$result = $CSV2 | Where {$_.name -like $record.name}
$result
}
You can do so with Compare-Object and a custom property definition.
Compare-Object $CSV1 $CSV2 -Property {$_.name -replace '^(.{5}).*', '$1'} -PassThru
$_.name -replace '^(.{5}).*', '$1' will take the first 5 characters from the property name (or less if the string is shorter than 5 characters) and remove the rest. This property is then used for comparing the records from $CSV1 and $CSV2. The parameter -PassThru makes the cmdlet emit the original data rather than objects with just the custom property. In theory you could also use $_.name.Substring(0, 5) instead of a regular expression replacement for extracting the first 5 characters. However, that would throw an error if the name is shorter than 5 characters like in the first record from $CSV1.
By default Compare-Object outputs the differences between the input objects, so you also need to add the parameters -IncludeEqual and -ExcludeDifferent to get just the matching records.
Pipe the result through Select-Object * -Exclude SideIndicator to remove the property SideIndicator from the output.
foreach ($record in $CSV1) {
$CSV2 | Where {"$($_.name)12345".SubString(0, 5) -eq "$($record.name)12345".SubString(0, 5)} |
ForEach {[PSCustomObject]#{Name1 = $Record.Name; Name2 = $_.Name}}
}
or:
... | Where {($_.name[0..4] -Join '') -eq ($record.name[0..4] -Join '')} | ...
Using this Join-Object cmdlet:
$CSV1 | Join $CSV2 `
-Using {($Left.name[0..4] -Join '') -eq ($Right.name[0..4] -Join '')} `
-Property #{Name1 = {$Left.Name}; Name2 = {$Right.Name}}
All the above result in:
Name1 Name2
----- -----
test & example; test & example&testagain;
test & example & testagain; test & example&testagain;
I need to work with csv files in PowerShell that have a duplicate column header. The reasons which they have a duplicate column are beyond me. Such is life.
I want to use Import-Csv so that I can easily deal with the data, but since the duplicate column exists I get this error:
Import-Csv : The member "PROC STAT" is already present.
At C:\Users\MyName\Documents\SomeFolder\testScript1.ps1:10 char:9
+ $csv2 = Import-Csv $files[0].FullName
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Import-Csv], ExtendedTypeSystemException
+ FullyQualifiedErrorId : AlreadyPresentPSMemberInfoInternalCollectionAdd,Microsoft.PowerShell.Commands.ImportCsvCommand
I could manually fix the problem by going into every csv file and deleting the duplicate column. But this is not an option. There are hundreds of them, and the script needs to be run periodically. Ideally I am looking for a way to programatically remove that column (Import-Csv won't work) or programatically change the name of the column (so that I can then Import-Csv and delete it). Any suggestions?
My code to loop through all the files:
$files = Get-ChildItem "C:\Users\MyName\Documents\SomeFolder\Data" -Filter *.csv
foreach($file in $files) {
$csv = Import-Csv $file.FullName
}
You can specify custom header names with the Header parameter:
Import-Csv .\file.csv -Header header1,header2,header3
This will treat the original header line as a normal row, so skip the first output object with Select-Object:
Import-Csv .\file.csv -Header header1,header2,header3 |Select-Object -Skip 1
I ran into this a few times as well and wrote this as work around. It works with any csv even if all/multiple column names are the same.
function Import-DuplicateHeaderCSV{
<#
# Synopsis
Workaround function for the powershell error: "Import-Csv : The member "column_name" is already present."
This error is returned when attempting to use the Import-CSV cmdlet on a csv which has duplicate column names.
# Description
The headers are looped through, read in, and parsed into an array.
Duplicate headers are stored into a hash table e.g.{#columnName = numOccurences}.
Multiple occurences of the header are supported by incrementing the value in the hashtable for each occurence.
The duplicate header is then inserted into the array with columnName_COPYnumOccruences.
Import-CSV is then used normally with the new column header array as the -header parameter.
.PARAMETER $Path
The full file path
e.g. "C:\users\johndoe\desktop\myfile.csv"
#>
param(
[Parameter(Mandatory=$true)] [string] $Path
)
$headerRow = Get-Content $Path | ConvertFrom-String -Delimiter "," | Select-Object -First 1
$objectSize = ($headerRow | Get-Member -MemberType NoteProperty | Measure-Object).Count
$headers = #()
$duplicates = #{}
for ($i = 1; $i -le $objectSize; $i++){
if ($headers -notcontains $headerRow."P$i"){
$headers += $headerRow."P$i"
}else{
if ($duplicates.$($headerRow."P$i") -gt 0){
$duplicates.$($headerRow."P$i")++
}else{
$duplicates.$($headerRow."P$i") = 1
}
$header = $($headerRow."P$i")
$header = $header + "_COPY"
$header = $header + ($duplicates.$($headerRow."P$i"))
$headers += $header
}
}
$headerString = ""
foreach ($item in $headers){$headerString += "'$item',"}
$headerString = $headerString.Substring(0,$headerString.Length -1)
$data = Invoke-Expression ("Import-Csv '$Path' " + "-Header " + $headerString)
return $data
}
you can load date with get-content and convert your data like this
Get-Content "C:\temp\test.csv" | ConvertFrom-String -Delimiter "," | select -Skip 1
short version:
gc "C:\temp\test.csv" | cfs -D "," | select -Skip 1
if you dont want rename auto the column you can rename manuelly like this
gc "C:\temp\test.csv" | cfs -D "," -PropertyNames head1, head2, head3 | select -Skip 1
Here's an example of how to do it without needing to hard-code the column header names in the code (i.e., dynamically generate a generic header based on the number of columns in the CSV file):
$csvFile = "test.csv"
# Count columns in CSV file
$columnCount = (Get-Content $csvFile |
Select-Object -Index 1,2 |
ConvertFrom-Csv |
Get-Member -MemberType NoteProperty |
Measure-Object).Count
# Create list of generic property names (no duplicates)
$propertyNames = 1..$columnCount |
ForEach-Object { "Property{0}" -f $_ }
# Get CSV file content, skip header line, and convert from CSV using generic header
Get-Content $csvFile |
Select-Object -Skip 1 |
ConvertFrom-Csv -Header $propertyNames
One caveat with this solution is that the CSV file must have at least two rows of data (not counting the header line).
I have multiple CSV files that need to be merged to one. In every single CSV file there is a header and in the second row some text that I don't need.
I noticed the | Select -Skip 1 statement for the headers. Now I was wondering how I can skip the 3rd row?
I tried this, but this gives me an empty file
Get-ChildItem -Path $CSVFolder -Recurse -Filter "*.csv" | %{
Import-Csv $_.FullName -Header header1, header3, header4 |
Select -Skip 1 | Select -Skip 2
} | Export-Csv "C:\Export\result.csv" -NoTypeInformation
Select-Object doesn't allow you to skip arbitrary rows in between other rows. If you want to remove a particular row from a text input file, you can do so with a counter, e.g. like this:
$cnt = 0
Import-Csv 'C:\path\to\input.csv' |
Where-Object { ($cnt++) -ne 3 } |
Export-Csv 'C:\path\to\output.csv' -NoType
If the records in your input CSV don't have nested line breaks you could also use Get-Content/Set-Content, which is probably a little faster than Import-Csv/Export-Csv (due to less parsing overhead). Increase the line number you want to skip by one to account for the header line.
$cnt = 0
Get-Content 'C:\path\to\input.csv' |
Where-Object { ($cnt++) -ne 4 } |
Set-Content 'C:\path\to\output.csv'
try this
$i=0;
import-csv "C:\temp2\missing.csv" | %{$i++; if ($i -ne 3) {$_}} | export-csv "C:\temp2\result.csv" -NoTypeInformation
If all you are doing si skipping the first the rows in all user cases, just use -skip 3.
Get-Content -Path 'D:\Temp\UserRecord.csv'
# Results
<#
Name Codes
------- ---------
John AJFKC,EFUY
Ben EFOID, EIUF
Alex OIPORE, OUOIJE
#>
# Return all text after row the Header and row 3
(Get-Content -Path 'D:\Temp\UserRecord.csv') |
Select -Skip 3
# Results
<#
Ben EFOID, EIUF
Alex OIPORE, OUOIJE
#>
See also:
Parsing Text with PowerShell (1/3)
How do I read only the head from a CSV file and write the columnn names into an array?
I have found a solution using following cmdlets:
$obj = Import-Csv '.\users.csv' -Delimiter ';'
$headerarray = ($obj | Get-member -MemberType 'NoteProperty' | Select-Object -ExpandProperty 'Name')
But the problem is the name - values are auto sorted alphabetic
Anyone has a solution for this?
You can get the column names of a CSV file like this:
import-csv <csvfilename> |
select-object -first 1 | foreach-object { $_.PSObject.Properties } |
select-object -expandproperty Name