How can I count the number of CSV columns when the file has multiline data and no header - powershell

My CSV files have no headers and multi line entries like this:
11;"multi line
col12";13;foobar;foobar
21;22;23;24;25
And I'd like to count the number of columns. So 5 in this example. How do I do that?
What I tried:
Import-CSV doesn't work without the header parameter due to duplicate entries on the first line.
(Import-Csv .\bad.csv -Delimiter ";" | get-member -type NoteProperty).count
Adding a header parameter skews the count.
(Import-Csv .\bad.csv -Delimiter ";" -Header (1..99) | get-member -type NoteProperty).count
I had to abort reading the file manually via Get-Content because of all the parsing I would have to handle manually. Escaping characters and multi line entries...
My version of PowerShell is 3 and I have to port my script to version 2 later on.

If you are willing to accept the caveat that this could miscount the number of columns if there are quoted delimiters in string this could be good enough for you.
$path = "c:\temp\test.txt"
$delimiter = ";"
$numberOfColumns = Get-Content $path |
ForEach-Object{($_.split($delimiter)).Count} |
Measure-Object -Maximum |
Select-Object -ExpandProperty Maximum
Import-Csv $path -Header (1..$numberOfColumns) -Delimiter $delimiter
Read in the file with Get-Content and isolate the maximum number of columns by
splitting each line on its delimiter and then using that value to import the CSV. If the file is large you can read in the file once with Get-Content and then use ConvertTo-CSV once you know your column count.
If all lines contain a line break on them the above logic would fail. Still we could temporarily scrub the data by removing the correct line breaks in order to get the accurate count.
$delimiter = ";"
$fileData = (Get-Content $path | Out-String)
$numberOfColumns = ((($fileData -replace "(`"[^;]+?)`r`n",'$1') -split "`r`n" | Select -First 1).split($delimiter)).Count
$fileData | ConvertFrom-Csv -Header (1..$numberOfColumns) -Delimiter $delimiter
What this will do is find lines that end where there is a double quote followed by data that does not contain the delimiter. We also match the newline that follows but drop that same new line in the replacement. If that is done we know that the first line is proper. Use that same line to split and count just like before.

Since Excel knows, let's ask him :
$path = "path\to\bad.csv"
$excel = New-Object -ComObject Excel.Application
$workbook = $excel.Workbooks.Open($path)
$sheet = $workbook.ActiveSheet
$columnIndex = 1
while($sheet.Cells.Item(1, $columnIndex).Text -ne "") {
$columnIndex++
}
"There are $($columnIndex - 1) columns in CSV file $path"
Start-Sleep -Seconds 1
Get-Process excel | Stop-Process -Force
As pointed out by Ansgar Wiechers in comments, there is a much shorter solution :
$path = "path\to\bad.csv"
$excel = New-Object -ComObject Excel.Application
$workbook = $excel.Workbooks.Open($path)
$sheet = $workbook.ActiveSheet
$columnCount = $sheet.UsedRange.Columns.Count
"There are $columnCount columns in CSV file $path"
Start-Sleep -Seconds 1
Get-Process excel | Stop-Process -Force
(I know my way of killing Excel is dirty, but iirc it takes too much code to do so)

I know this is very old, but I came across a similar situation (did not have have rows of varying columns) today and found my own solution so I thought I would share for anyone else coming into this situation. My solution was to use Get-Content for the first row of the CSV and -split on the delimiter (,) to create an array and then return the count of the array. As mentioned in replies above, this will not account for delimiters existing within quotations.
((Get-Content $PathToCsv)[0] -split ",").count

I had the same issue and went with AAgent suggestion.
$CommaCount = ((Get-Content $PathToCsv)[0] -split ",").count
$SemicolonCount = ((Get-Content $PathToCsv)[0] -split ";").count
if ($CommaCount -gt $SemicolonCount){
$CMSlist = Import-Csv ($PathToCsv) –Delimiter “,”
}
else{
$CMSlist = Import-Csv ($PathToCsv) –Delimiter “;”

Related

Remove Columns in multiple CSVs POWERSHELL [duplicate]

I need to remove several columns from a CSV file without importing the CSV file in Powershell. Below is an example of my input CSV and what I hope the output CSV can look like.
Input.csv
A,1,2,3,4,5
B,6,7,8,9,10
C,11,12,13,14,15
D,15,16,17,18,19,20
Idealoutput.csv
A,3,5
B,8,10
C,13,15
D,17,20
I have tried doing this the following code, but it is giving me plenty of errors and saying that I cannot use the "Delete" method this way (which I have done in the past)...Any ideas?
$Workbook1 = $Excel.Workbooks.open($file.FullName)
$header = $Workbook1.ActiveSheet.Range("A1:A68").EntireRow
$unneededcolumns1 = $Workbook1.ActiveSheet.Range("A1:O1").EntireColumn
$unneededcolumns2 = $Workbook1.ActiveSheet.Range("B1:K1").EntireColumn
$unneededcolumns3 = $Workbook1.ActiveSheet.Range("F1:I1").EntireColumn
$unneededcolumns4 = $Workbook1.ActiveSheet.Range("G1:I1").EntireColumn
$unneededcolumns5 = $Workbook1.ActiveSheet.Range("H1:O1").EntireColumn
$unneededcolumns6 = $Workbook1.ActiveSheet.Range("J1:AL1").EntireColumn
$unneededcolumns7 = $Workbook1.ActiveSheet.Range("K1").EntireColumn
$unneededcolumns8 = $Workbook1.ActiveSheet.Range("L1:AK1").EntireColumn
$unneededcolumns9 = $Workbook1.ActiveSheet.Range("F1:I1").EntireColumn
$unneededcolumns10 = $Workbook1.ActiveSheet.Range("M1:AB1").EntireColumn
$unneededcolumns11 = $Workbook1.ActiveSheet.Range("N1:X1").EntireColumn
$unneededcolumns12 = $Workbook1.ActiveSheet.Range("O1:BA1").EntireColumn
$unneededcolumns13 = $Workbook1.ActiveSheet.Range("P1:U1").EntireColumn
$header.Delete()
$unneededcolumns1.Delete()
$unneededcolumns2.Delete()
$unneededcolumns3.Delete()
$unneededcolumns4.Delete()
$unneededcolumns5.Delete()
$unneededcolumns6.Delete()
$unneededcolumns7.Delete()
$unneededcolumns8.Delete()
$unneededcolumns9.Delete()
$unneededcolumns10.Delete()
$unneededcolumns11.Delete()
$unneededcolumns12.Delete()
$unneededcolumns13.Delete()
$Workbook1.SaveAs("\\output.csv")
I am just going to add this anyway since I hope to convince you how easy it will be to avoid having to use Excel.
$source = "c:\temp\file.csv"
$destination = "C:\temp\newfile.csv"
(Import-CSV $source -Header 1,2,3,4,5,6 |
Select "1","4","6" |
ConvertTo-Csv -NoTypeInformation |
Select-Object -Skip 1) -replace '"' | Set-Content $destination
We assign arbitrary headers to the object and that way we can call the 1st, 4th and 6th columns by position. Once exported the file will have the following contents which match what I think you want and not what you had in the question. Your last line had an extra value (20) on it which I don't know if it was on purpose or not.
A,3,5
B,8,10
C,13,15
D,17,19
If this is not viable I am really interested as to why.
Excel Approach
Alright, so the file is enormous so Import-CSV is not a viable option. Keeping with your excel idea I came up with this. What it will do is take column indexes and delete any column that is not in those indices.
Wait you say?... that wont work since the column indexes change as you remove columns. Using the indices we want to keep we get the inverse to delete based on the UsedRows of the sheet. We then take each of those columns to delete and remove a value equal to is array position. Reason being is that when a column is actually deleted the next value has already been adjusted to account for the shift.
$file = "c:\temp\file.csv"
$ColumnsToKeep = 1,4,6
# Create the com object
$excel = New-Object -comobject Excel.Application
$excel.DisplayAlerts = $False
$excel.visible = $False
# Open the CSV File
$workbook = $excel.Workbooks.Open($file)
$sheet = $workbook.Sheets.Item(1)
# Determine the number of rows in use
$maxColumns = $sheet.UsedRange.Columns.Count
$ColumnsToRemove = Compare-Object $ColumnsToKeep (1..$maxColumns) | Where-Object{$_.SideIndicator -eq "=>"} | Select-Object -ExpandProperty InputObject
0..($ColumnsToRemove.Count - 1) | %{$ColumnsToRemove[$_] = $ColumnsToRemove[$_] - $_}
$ColumnsToRemove | ForEach-Object{
[void]$sheet.Cells.Item(1,$_).EntireColumn.Delete()
}
# Save the edited file
$workbook.SaveAs("C:\temp\newfile.csv", 6)
# Close excel and release the com object.
$workbook.Close($true)
$excel.Quit()
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
Remove-Variable excel
I was having issues with Excel remaining open even after reading up on the "correct" way to do it. The inner logic is what is important. Don't forget to change your paths as needed.
Here's a better approach that I use, but it's not the most performant on large files. Both have been tested on 1GB files.
Powershell:
Import-Csv '.\inputfile.csv'
| select ColumnName1,ColumnName2,ColumnName3
| Export-Csv -Path .\outputfile.csv -NoTypeInformation
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/export-csv?view=powershell-5.1
If you want to get rid of those pesky quotes that the tool adds, upgrade to Powershell 7.
Powershell 7+:
Import-Csv '.\inputfile.csv'
| select ColumnName1,ColumnName2,ColumnName3
| Export-Csv -Path .\outputfile.csv -NoTypeInformation -UseQuotes Never
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/export-csv?view=powershell-7

How to seperate CSV values within a CSV into new rows in PowerShell

I'm receiving an automated report from a system that cannot be modified as a CSV. I am using PowerShell to split the CSV into multiple files and parse out the specific data needed. The CSV contains columns that may contain no data, 1 value, or multiple values that are comma separated within the CSV file itself.
Example(UPDATED FOR CLARITY):
"Group","Members"
"Event","362403"
"Risk","324542, 340668, 292196"
"Approval","AA-334454, 344366, 323570, 322827, 360225, 358850, 345935"
"ITS","345935, 358850"
"Services",""
I want the data to have one entry per line like this (UPDATED FOR CLARITY):
"Group","Members"
"Event","362403"
"Risk","324542"
"Risk","340668"
"Risk","292196"
#etc.
I've tried splitting the data and I just get an unknown number of columns at the end.
I tried a foreach loop, but can't seem to get it right (pseudocode below):
Import-CSV $Groups
ForEach ($line in $Groups){
If($_.'Members'.count -gt 1, add-content "$_.Group,$_.Members[2]",)}
I appreciate any help you can provide. I've searched all the stackexchange posts and used Google but haven't been able to find something that addresses this exact issue.
Import-Csv .\input.csv | ForEach-Object {
ForEach ($Member in ($_.Members -Split ',')) {
[PSCustomObject]#{Group = $_.Group; Member = $Member.Trim()}
}
} | Export-Csv .\output.csv -NoTypeInformation
# Get the raw text contents
$CsvContents = Get-Content "\path\to\file.csv"
# Convert it to a table object
$CsvData = ConvertFrom-CSV -InputObject $CsvContents
# Iterate through the records in the table
ForEach ($Record in $CsvData) {
# Create array from the members values at commas & trim whitespace
$Record.Members -Split "," | % {
$MemberCount = $_.Trim()
# Check if the count is greater than 1
if($MemberCount -gt 1) {
# Create our output string
$OutputString = "$($Record.Group), $MemberCount"
# Write our output string to a file
Add-Content -Path "\path\to\output.txt" -Value $OutputString
}
}
}
This should work, you had the right idea but I think you may have been encountering some syntax issues. Let me know if you have questions :)
Revised the code as per your updated question,
$List = Import-Csv "\path\to\input.csv"
foreach ($row in $List) {
$Group = $row.Group
$Members = $row.Members -split ","
# Process for each value in Members
foreach ($MemberValue in $Members) {
# PS v3 and above
$Group + "," + $MemberValue | Export-Csv "\path\to\output.csv" -NoTypeInformation -Append
# PS v2
# $Group + "," + $MemberValue | Out-File "\path\to\output.csv" -Append
}
}

Powershell skip first 2 lines of txt file when importing it

I have a powershell script designed to read a txt file on a remote server and import it into SQL.
I want to be able to skip the first 2 lines of the txt file. I am currently using the code below to import the file. The txt file is delimited
$datatable = new-object System.Data.DataTable
$reader = New-Object System.IO.StreamReader($empFile)
$columns = (Get-Content $empfile -First 1).Split($empFileDelimiter)
if ($FirstRowColumnNames -eq $true)
{
$null = $reader.readLine()
}
foreach ($column in $columns)
{
$null = $datatable.Columns.Add()
}
# Read in the data, line by line, not column by column
while (($line = $reader.ReadLine()) -ne $null)
{
$null = $datatable.Rows.Add($line.Split($empFiledelimiter))
The column parameter takes the first line of the txt file and creates the columns for the PS datatable.
The problem I have is the first two lines of the txt file are not needed and I need to skip them and use the third line of the txt file for the columns. I have the following line of code which will do this but I am uncertain how to integrate it into my code.
get-content $empFile | select-object -skip 2
Create an array for the $empfile without the first two lines, then use the first item of the array for the Columns, like this:
$Content = Get-Content $empFile | Select-Object -Skip 2
$columns = $Content[0].Split($empFileDelimiter)
just a quick one liner
(Get-Content $empFile| Select-Object -Skip 2) | Set-Content $empFile
Put in two unused calls to ReadLine(). Something like this:
$datatable = new-object System.Data.DataTable
$reader = New-Object System.IO.StreamReader($empFile)
$reader.ReadLine()
$reader.ReadLine()
$columns = ($reader.ReadLine()).Split($empFileDelimiter)
...

Powershell .csv merge with column remove

Using the code below I am able to merge several .csv files in 5 seconds.
$getFirstLine = $true
get-childItem "C:\my\dir\*.csv" | foreach {
$filePath = $_
$lines = $lines = Get-Content $filePath
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "C:\my\dir\output_code2.csv" $linesToWrite
}
I would like to take this one step further, preferable using piping to remove several of the columns using a command like:
select DateAndTime,DG1_KW,DG2_KW,WT_KW,HTR1_KW,POSS_Load_KW,INV1_KW,INV2_SOC|Export-csv output_test.csv -Notypeinformation
that being the variables in the header of each file.
How would I modify this code to make this work? The idea here is that I am going to be working with hundreds up to thousands of files.
I have other code which can do this but it is no where near as fast.
for instance using 10 .csv files that are 450kb each. the code below takes 20 seconds to process and spits out a .csv file in 20 seconds removing 48 of the 56 columns leaving the variables I need. If I remove part of the code that trims the columns it still takes 12+ seconds.
# Directory containing csv files, include *.*
$directory = "C:\my\dir\*.*";
# Get the csv files
$csvFiles = Get-ChildItem -Path $directory -Filter *.csv;
#$content = $null;
$content = #();
# Process each file
foreach($csv in $csvFiles)
{
$content += Import-Csv $csv;
}
# Write a datetime stamped csv file
$datetime = Get-Date -Format "yyyyMMddhhmmss";
$content |Export-Csv -Path "C:\my\dir\output_code2_$datetime.csv" -NoTypeInformation;
The code I would like to modify runs those same 10 files in 5 seconds but does not remove the 48 columns.
Any Ideas guys?
Ok, you want an example... Let's say your CSVs always look like this:
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10
data1,data2,data3,data4,data5,data6,data7,data8,data9,data10
dataA,dataB,dataC,dataD,dataE,dataF,dataG,dataH,dataI,dataJ
Now let's say you only want Col1, Col2, Col6, Col9, and Col10. You could do a RegEx replace something like:
$Files = get-childItem "C:\my\dir\*.csv" | Select -Expand FullName
ForEach($File in $Files){
If($SkipFirst){
Get-Content $File | Select -Skip 1 | ForEach{$_ -replace "^((?:.*?\,){2})(?:.*\,){3}(.*?\,)(?:(?:.*?\,){2})(.*?,.*?)$", '$1$2$3'} | Add-Content "C:\my\dir\output_code2.csv"
}Else{
Get-Content $File | ForEach{$_ -replace "^((?:.*?\,){2})(?:.*\,){3}(.*?\,)(?:(?:.*?\,){2})(.*?,.*?)$", '$1$2$3'} | Add-Content "C:\my\dir\output_code2.csv"
}
}
That would extract just the columns that I noted above. See https://regex101.com/r/jY4oO6/1 for detailed breakdown of RegEx string. Effective output would be (skipping first line if so dictated):
Col1,Col2,Col6,Col9,Col10
data1,data2,data6,data9,data10
dataA,dataB,dataF,dataI,dataJ

Powershell - reading ahead and While

I have a text file in the following format:
.....
ENTRY,PartNumber1,,,
FIELD,IntCode,123456
...
FIELD,MFRPartNumber,ABC123,,,
...
FIELD,XPARTNUMBER,ABC123
...
FIELD,InternalPartNumber,3214567
...
ENTRY,PartNumber2,,,
...
...
the ... indicates there is other data between these fields. The ONLY thing I can be certain of is that the field starting with ENTRY is a new set of records. The rows starting with FIELD can be in any order, and not all of them may be present in each group of data.
I need to read in a chunk of data
Search for any field matching the
string ABC123
If ABC123 found, search for the existence of the
InternalPartNumber field & return that row of data.
I have not seen a way to use Get-Content that can read in a variable number of rows as a set & be able to search it.
Here is the code I currently have, which will read a file, searching for a string & replacing it with another. I hope this can be modified to be used in this case.
$ftype = "*.txt"
$fnames = gci -Path $filefolder1 -Filter $ftype -Recurse|% {$_.FullName}
$mfgPartlist = Import-Csv -Path "C:\test\mfrPartList.csv"
foreach ($file in $fnames) {
$contents = Get-Content -Path $file
foreach ($partnbr in $mfgPartlist) {
$oldString = $mfgPartlist.OldValue
$newString = $mfgPartlist.NewValue
if (Select-String -Path $file -SimpleMatch $oldString -Debug -Quiet) {
$stringData = $contents -imatch $oldString
$stringData = $stringData -replace "[\n\r]","|"
foreach ($dataline in $stringData) {
$file +"|"+$stringData+"|"+$oldString+"|"+$newString|Out-File "C:\test\Datachanges.txt" -Width 2000 -Append
}
$contents = $contents -replace $oldString $newString
Set-Content -Path $file -Value $contents
}
}
}
Is there a way to read & search a text file in "chunks" using Powershell? Or to do a Read-ahead & determine what to search?
Assuming your fine isn't too big to read into memory all at once:
$Text = Get-Content testfile.txt -Raw
($Text -split '(?ms)^(?=ENTRY)') |
foreach {
if ($_ -match '(?ms)^FIELD\S+ABC123')
{$_ -replace '(?ms).+(^Field\S+InternalPartNumber.+?$).+','$1'}
}
FIELD,InternalPartNumber,3214567
That reads the entire file in as a single multiline string, and then splits it at the beginning of any line that starts with 'ENTRY'. Then it tests each segment for a FIELD line that contains 'ABC123', and if it does, removes everything except the FIELD line for the InternalPartNumber.
This is not my best work as I have just got back from vacation. You could use a while loop reading the text and set an entry flag to gobble up the text in chunks. However if your files are not too big then you could just read up the text file at once and use regex to split up the chunks and then process accordingly.
$pattern = "ABC123"
$matchedRowToReturn = "InternalPartNumber"
$fileData = Get-Content "d:\temp\test.txt" | Where-Object{$_ -match '^(entry|field)'} | Out-String
$parts = $fileData | Select-String '(?smi)(^Entry).*?(?=^Entry|\Z)' -AllMatches | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value
$parts | Where-Object{$_ -match $pattern} | Select-String "$matchedRowToReturn.*$" | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value
What this will do is read in the text file, drop any lines that are not entry or field related, as one long string and split it up into chunks that start with lines that begin with the work "Entry".
Then we drop those "parts" that do not contain the $pattern. Of the remaining that match extract the InternalPartNumber line and present.