Remove Columns in multiple CSVs POWERSHELL [duplicate] - powershell

I need to remove several columns from a CSV file without importing the CSV file in Powershell. Below is an example of my input CSV and what I hope the output CSV can look like.
Input.csv
A,1,2,3,4,5
B,6,7,8,9,10
C,11,12,13,14,15
D,15,16,17,18,19,20
Idealoutput.csv
A,3,5
B,8,10
C,13,15
D,17,20
I have tried doing this the following code, but it is giving me plenty of errors and saying that I cannot use the "Delete" method this way (which I have done in the past)...Any ideas?
$Workbook1 = $Excel.Workbooks.open($file.FullName)
$header = $Workbook1.ActiveSheet.Range("A1:A68").EntireRow
$unneededcolumns1 = $Workbook1.ActiveSheet.Range("A1:O1").EntireColumn
$unneededcolumns2 = $Workbook1.ActiveSheet.Range("B1:K1").EntireColumn
$unneededcolumns3 = $Workbook1.ActiveSheet.Range("F1:I1").EntireColumn
$unneededcolumns4 = $Workbook1.ActiveSheet.Range("G1:I1").EntireColumn
$unneededcolumns5 = $Workbook1.ActiveSheet.Range("H1:O1").EntireColumn
$unneededcolumns6 = $Workbook1.ActiveSheet.Range("J1:AL1").EntireColumn
$unneededcolumns7 = $Workbook1.ActiveSheet.Range("K1").EntireColumn
$unneededcolumns8 = $Workbook1.ActiveSheet.Range("L1:AK1").EntireColumn
$unneededcolumns9 = $Workbook1.ActiveSheet.Range("F1:I1").EntireColumn
$unneededcolumns10 = $Workbook1.ActiveSheet.Range("M1:AB1").EntireColumn
$unneededcolumns11 = $Workbook1.ActiveSheet.Range("N1:X1").EntireColumn
$unneededcolumns12 = $Workbook1.ActiveSheet.Range("O1:BA1").EntireColumn
$unneededcolumns13 = $Workbook1.ActiveSheet.Range("P1:U1").EntireColumn
$header.Delete()
$unneededcolumns1.Delete()
$unneededcolumns2.Delete()
$unneededcolumns3.Delete()
$unneededcolumns4.Delete()
$unneededcolumns5.Delete()
$unneededcolumns6.Delete()
$unneededcolumns7.Delete()
$unneededcolumns8.Delete()
$unneededcolumns9.Delete()
$unneededcolumns10.Delete()
$unneededcolumns11.Delete()
$unneededcolumns12.Delete()
$unneededcolumns13.Delete()
$Workbook1.SaveAs("\\output.csv")

I am just going to add this anyway since I hope to convince you how easy it will be to avoid having to use Excel.
$source = "c:\temp\file.csv"
$destination = "C:\temp\newfile.csv"
(Import-CSV $source -Header 1,2,3,4,5,6 |
Select "1","4","6" |
ConvertTo-Csv -NoTypeInformation |
Select-Object -Skip 1) -replace '"' | Set-Content $destination
We assign arbitrary headers to the object and that way we can call the 1st, 4th and 6th columns by position. Once exported the file will have the following contents which match what I think you want and not what you had in the question. Your last line had an extra value (20) on it which I don't know if it was on purpose or not.
A,3,5
B,8,10
C,13,15
D,17,19
If this is not viable I am really interested as to why.
Excel Approach
Alright, so the file is enormous so Import-CSV is not a viable option. Keeping with your excel idea I came up with this. What it will do is take column indexes and delete any column that is not in those indices.
Wait you say?... that wont work since the column indexes change as you remove columns. Using the indices we want to keep we get the inverse to delete based on the UsedRows of the sheet. We then take each of those columns to delete and remove a value equal to is array position. Reason being is that when a column is actually deleted the next value has already been adjusted to account for the shift.
$file = "c:\temp\file.csv"
$ColumnsToKeep = 1,4,6
# Create the com object
$excel = New-Object -comobject Excel.Application
$excel.DisplayAlerts = $False
$excel.visible = $False
# Open the CSV File
$workbook = $excel.Workbooks.Open($file)
$sheet = $workbook.Sheets.Item(1)
# Determine the number of rows in use
$maxColumns = $sheet.UsedRange.Columns.Count
$ColumnsToRemove = Compare-Object $ColumnsToKeep (1..$maxColumns) | Where-Object{$_.SideIndicator -eq "=>"} | Select-Object -ExpandProperty InputObject
0..($ColumnsToRemove.Count - 1) | %{$ColumnsToRemove[$_] = $ColumnsToRemove[$_] - $_}
$ColumnsToRemove | ForEach-Object{
[void]$sheet.Cells.Item(1,$_).EntireColumn.Delete()
}
# Save the edited file
$workbook.SaveAs("C:\temp\newfile.csv", 6)
# Close excel and release the com object.
$workbook.Close($true)
$excel.Quit()
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
Remove-Variable excel
I was having issues with Excel remaining open even after reading up on the "correct" way to do it. The inner logic is what is important. Don't forget to change your paths as needed.

Here's a better approach that I use, but it's not the most performant on large files. Both have been tested on 1GB files.
Powershell:
Import-Csv '.\inputfile.csv'
| select ColumnName1,ColumnName2,ColumnName3
| Export-Csv -Path .\outputfile.csv -NoTypeInformation
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/export-csv?view=powershell-5.1
If you want to get rid of those pesky quotes that the tool adds, upgrade to Powershell 7.
Powershell 7+:
Import-Csv '.\inputfile.csv'
| select ColumnName1,ColumnName2,ColumnName3
| Export-Csv -Path .\outputfile.csv -NoTypeInformation -UseQuotes Never
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/export-csv?view=powershell-7

Related

UPDATE THE FIRST CELL IN A CSV FILE USING POWER SHELL SCRIPT

I already have this file Workspacesize.csv. I am adding a TEST value to cell (1,2) and trying to save. It is asking me for a prompt that the file already exists, do you want to overwrite. I donot want this prompt. I have used $Excelobject.DisplayAlerts= 'False' but still it does not work.
$Excelobject=New-object -ComObject Excel.Application
$Excelobject.visible = $False
$workbook=$Excelobject.Workbooks.Open("C:\Users\Siddhartha.S.Das2\OneDrive - Shell\Desktop\Workspacesize.csv")
$worksheet=$workbook.worksheets.Item(1)
$worksheet.Activate()
$worksheet.cells.item(1,2)="TEST"
$workbook.SaveAs("C:\Users\Siddhartha.S.Das2\OneDrive - Shell\Desktop\Workspacesize.csv")
$workbook.close
$Excelobject.DisplayAlerts= 'False'
$Excelobject.Quit()
You're better off not using excel for csv files, it unnecessarily complicates things.
$Path = "C:\Users\Siddhartha.S.Das2\OneDrive - Shell\Desktop\Workspacesize.csv"
$Content = Import-Csv -Path $Path
$Content[0].Col1 = 'TEST' #Put your actual column name rather than Col1
$Content | Export-Csv -Path $Path

Modify a .csv file in powershell automatically

I try to create a powershell script, to perform a few steps:
In a specific folder, I put a .xlsx file, it converts it to csv. Until now I got this:
$ErrorActionPreference = 'Stop'
Function Convert-CsvInBatch
{
[CmdletBinding()]
Param
(
[Parameter(Mandatory=$true)][String]$Folder
)
$ExcelFiles = Get-ChildItem -Path $Folder -Filter *.xlsx -Recurse
$excelApp = New-Object -ComObject Excel.Application
$excelApp.DisplayAlerts = $false
$ExcelFiles | ForEach-Object {
$workbook = $excelApp.Workbooks.Open($_.FullName)
$csvFilePath = $_.FullName -replace "\.xlsx$", ".csv"
$workbook.SaveAs($csvFilePath, [Microsoft.Office.Interop.Excel.XlFileFormat]::xlCSV)
$workbook.Close()
}
# Release Excel Com Object resource
$excelApp.Workbooks.Close()
$excelApp.Visible = $true
Start-Sleep 5
$excelApp.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excelApp) | Out-Null
}
#
# 0. Prepare the folder path which contains all excel files
$FolderPath = "C:\exacthpath"
Convert-CsvInBatch -Folder $FolderPath
The columns in the file, are still there, so I want to remove them, and insert a ';' instead, like:
H;1;43;185;
At this point I'm stuck. I can import it into Powershell like:
Import-Csv -Path 'C:\folder\filename.csv' | ForEach-Object {
$_
}
I get this look, and the most important task is here, in the first row only:
H;1;43;185;
This should be modified into:
H;01;43;185
the rest should be left untouched.
After I need to export back it into a CSV file, like:
Export-Csv -Path 'C:\folder\modified_filename.csv'
But this whole process should be inserted in one single powershell script, which performs the above steps on it's own. So in short:
identifies any .xlsx file - regardless of it's name
convers it into .csv
modifies the outlook of the document, to separate the columns with a ";"
modify the first line to have 'H;01;43;185' - this is a static line, it will always look like this
save the created file as a final .csv file
Can you help me somehow to include/optimize the above scripts and let powershell perform the modification too? Example content of a file like this (final look) Usually it includes more 1000+ lines:
H;01;43;185
D;111;3;1042;2
D;222;3;1055;3
D;333;3;1085;1
T;3;;;
Any help is highly appreciated.
Regards,
Armin
If as you say in your comment, your Excel already creates a csv with the semi-colon as delimiter, you can do this inside the loop, just below $workbook.Close()
# read the file created by Excel as string array
$data = Get-Content $csvFilePath
# overwrite the file with just the new header
Set-Content -Path $csvFilePath -Value 'H;01;43;185'
# add the rest of the data to the file
$data[1..($data.Count -1)] | Add-Content -Path $csvFilePath
P.S. I would delete the lines
$excelApp.Visible = $true
Start-Sleep 5
because I don't see the need to have Excel show itself and pause the function for 5 seconds.. Instead, have Excel not show at all so it will work a lot faster by adding
$excelApp.Visible = $false
right after you have created the $excelApp

Powershell - Export array to CSV in different columns

I am trying to automate below API calls from a csv file.
http_uri
/ModuleName/api/12345/moverequest/MoveRequestQueue?batchSize=200
/ModuleName/api/Portal/GetGarageLocations?email=Dummy#mail.com
/ModuleName/api/DeliveryDate/CommitEta?ref=H7J3M1EA4LF
/ModuleName/api/35345/moverequest/MoveRequestQueue?batchSize=500
The output should be like below in a csv file.
ScenarioName Parameter Value
MoveRequestQueue batchSize 200
GetGarageLocations email Dummy#mail.com
CommitEta ref H7J3M1EA4LF
MoveRequestQueue batchSize 500
I am using below code
$csv = Import-Csv C:\Powershell\Documents\Source.csv
$scenario = #()
ForEach ($row in $csv){
$httpuri = $($row.http_uri)
#Iterating through CSV rows and segregate values
if ($httpuri -match "="){
$equalarr = $httpuri -split '='
if ($equalarr[0] -match "\?"){
$questionarr = $equalarr[0] -split '\?'
$scenarionamearr = $questionarr[0] -split '/'
$totalelements = $scenarionamearr.Count
$scenarioname = $scenarionamearr[$totalelements-1]
$Scenario += $scenarioname
$Scenario += $questionarr[1]
$Scenario += $equalarr[1]
}
}
}
#Adding columns to csv
$columnName = '"Scenario","Parameter","Value"'
Add-Content -Path C:\Powershell\Documents\Output.csv -Value $columnName
#Writing values to CSV
$Scenario | foreach { Add-Content -Path C:\Powershell\Documents\Output.csv -Value $_ }
But Outout is generated like below
Scenario Parameter Value
DequeueMoveRequestQueue
batchSize
200
GetCarrierLocations
email
x-qldanxqldanx
Since i am a newbie, searched a lot to solve this issue but couldn't succeed. Please throw some light on this.
Thanks in advance....
If you store your scenarios in structured objects you can use Powershell's built in Export-Csv command to generate your csv.
So, instead of
$Scenario += $scenarioname
$Scenario += $questionarr[1]
$Scenario += $equalarr[1]
store an array of powershell objects:
$Scenario += [PSCustomObject]#{
"Scenario" = $scenarioname;
"Parameter" = $questionarr[1];
"Value" = $equalarr[1];}
Then, when creating the csv file, just use Export-Csv:
$Scenario | Export-Csv -NoTypeInformation -Path C:\Powershell\Documents\Output.csv
So the issue is that you make an empty array, then add strings to it one at a time, which just makes it an array of strings. Then when you output it to the file it just adds each string to the file on its own line. What you want to do is create an array of objects, then use the Export-Csv cmdlet to output it to a CSV file.
Creating an array, and then adding things to it one at a time is not a good way to do it. PowerShell has to recreate the array each time you add something the way you're doing it. Better would be to have a pipeline that outputs what you want (objects, rather than strings), and capture them all at once creating the array one time. Or even better, just output them to the CSV file and not recollect them in general.
$CSV = Import-Csv C:\Powershell\Documents\Source.csv
$CSV.http_uri -replace '^.*/(.*)$','$1'|ForEach-Object{
$Record = $_ -split '[=\?]'
[PSCustomObject]#{
ScenarioName = $Record[0]
Parameter = $Record[1]
Value = $Record[2]
}
} | Export-Csv -Path C:\Powershell\Documents\Output.csv -Append

Adding multiple rows to CSV file at once through PowerShell

Background
I've been looking through several posts here on Stack and can only find answers to "how to add one single row of data to a CSV file" (notably this one). While they are good, they only refer to the specific case of adding a single entry from memory. Suppose I have 100,000 rows I want to add to a CSV file, then the speed of the query will be orders of magnitude slower if I for each row write it to file. I imagine that it will be much faster to keep everything in memory, and once I've built a variable that contains all the data that I want to add, only then write it to file.
Current situation
I have log files that I receive from customers containing about half a million rows. Some of these rows begin with a datetime and how much memory the server is using. In order to get a better view of how the memory usage looks like, I want to plot the memory usage over time using this information. (Note: yes, the best solution would be to ask the developers to add this information as it is fairly common we need this, but since we don't have that yet, I need to work with what I got)
I am able to read the log files, extract the contents, create two variables called $timeStamp and $memoryUsage that finds all the relevant entries. The problem occurs when I occurs when I try to add this to a custom PSObject. It would seem that using a $csvObject += $newRow only adds a pointer to the $newRow variable rather than the actual row itself. Here's the code that I've got so far:
$header1 = "Time Stamp"
$header2 = "Memory Usage"
$csvHeaders = #"
$header1;$header2
"#
# The following two lines are a workaround to make sure that the $csvObject becomes a PSObject that matches the output I'm trying to achieve.
$csvHeaders | Out-File -FilePath $csvFullPath
$csvObject = Import-Csv -Path $csvFullPath -Delimiter ";"
foreach ($TraceFile in $traceFilesToLookAt) {
$curTraceFile = Get-Content $TraceFile.FullName
Write-Host "Starting on file: $($TraceFile.Name)`n"
foreach ($line in $curTraceFile) {
try {
if (($line.Substring(4,1) -eq '-') -and ($line.Substring(7,1) -eq '-')) {
$TimeStamp = $line.Split("|",4)[0]
$memoryUsage = $($line.Split("|",4)[2]).Replace(",","")
$newRow = New-Object PSObject -Property #{
$header1 = $TimeStamp;
$header2 = $memoryUsage
}
$reorderedRow = $newRow | Select-Object -Property $header1,$header2
$reorderedRow | Export-Csv -Path $csvFullPath -Append -Delimiter ";"
}
} catch {
Out-Null
}
This works fine as it appends the row each time it finds one to the CSV file. The problem is that it's not very efficient.
End goal
I would ideally like to solve it with something like:
$newRow = New-Object PSObject -Property #{
$header1 = $TimeStamp;
$header2 = $memoryUsage
}
$rowsToAddToCSV += $newRow
And then in the final step do a:
$rowsToAddToCSV | Export-Csv -Path $csvFullPath -Append -Delimiter ";"
I have not been able to create any form of workaround for this. Among other things, PowerShell tells me that op_Addition is not part of the object, that the object I'm trying to export (the collection of rows) doesn't match the CSV file etc.
Anything that appends thousands of items to an array in a loop is bound to perform poorly, because each time an item is appended, the array will be re-created with its size increased by one, all existing items are copied, and then the new item is put in the new free slot.
Any particular reason why you can't simply do something like this?
$traceFilesToLookAt | ForEach-Object {
Get-Content $_.FullName | ForEach-Object {
if ($_.Substring(4, 1) -eq '-' -and $_.Substring(7, 1) -eq '-') {
$line = $_.Split('|', 4)
New-Object PSObject -Property #{
'Time Stamp' = $line[0]
'Memory Usage' = $line[2].Replace(',', '')
}
}
}
} | Export-Csv -Path $csvFullPath -Append -Delimiter ";"
A regular expression match might be an even more elegant approach to extracting timestamp and memory usage from the input files, but I'm going to leave that as an exercise for you.

How can I count the number of CSV columns when the file has multiline data and no header

My CSV files have no headers and multi line entries like this:
11;"multi line
col12";13;foobar;foobar
21;22;23;24;25
And I'd like to count the number of columns. So 5 in this example. How do I do that?
What I tried:
Import-CSV doesn't work without the header parameter due to duplicate entries on the first line.
(Import-Csv .\bad.csv -Delimiter ";" | get-member -type NoteProperty).count
Adding a header parameter skews the count.
(Import-Csv .\bad.csv -Delimiter ";" -Header (1..99) | get-member -type NoteProperty).count
I had to abort reading the file manually via Get-Content because of all the parsing I would have to handle manually. Escaping characters and multi line entries...
My version of PowerShell is 3 and I have to port my script to version 2 later on.
If you are willing to accept the caveat that this could miscount the number of columns if there are quoted delimiters in string this could be good enough for you.
$path = "c:\temp\test.txt"
$delimiter = ";"
$numberOfColumns = Get-Content $path |
ForEach-Object{($_.split($delimiter)).Count} |
Measure-Object -Maximum |
Select-Object -ExpandProperty Maximum
Import-Csv $path -Header (1..$numberOfColumns) -Delimiter $delimiter
Read in the file with Get-Content and isolate the maximum number of columns by
splitting each line on its delimiter and then using that value to import the CSV. If the file is large you can read in the file once with Get-Content and then use ConvertTo-CSV once you know your column count.
If all lines contain a line break on them the above logic would fail. Still we could temporarily scrub the data by removing the correct line breaks in order to get the accurate count.
$delimiter = ";"
$fileData = (Get-Content $path | Out-String)
$numberOfColumns = ((($fileData -replace "(`"[^;]+?)`r`n",'$1') -split "`r`n" | Select -First 1).split($delimiter)).Count
$fileData | ConvertFrom-Csv -Header (1..$numberOfColumns) -Delimiter $delimiter
What this will do is find lines that end where there is a double quote followed by data that does not contain the delimiter. We also match the newline that follows but drop that same new line in the replacement. If that is done we know that the first line is proper. Use that same line to split and count just like before.
Since Excel knows, let's ask him :
$path = "path\to\bad.csv"
$excel = New-Object -ComObject Excel.Application
$workbook = $excel.Workbooks.Open($path)
$sheet = $workbook.ActiveSheet
$columnIndex = 1
while($sheet.Cells.Item(1, $columnIndex).Text -ne "") {
$columnIndex++
}
"There are $($columnIndex - 1) columns in CSV file $path"
Start-Sleep -Seconds 1
Get-Process excel | Stop-Process -Force
As pointed out by Ansgar Wiechers in comments, there is a much shorter solution :
$path = "path\to\bad.csv"
$excel = New-Object -ComObject Excel.Application
$workbook = $excel.Workbooks.Open($path)
$sheet = $workbook.ActiveSheet
$columnCount = $sheet.UsedRange.Columns.Count
"There are $columnCount columns in CSV file $path"
Start-Sleep -Seconds 1
Get-Process excel | Stop-Process -Force
(I know my way of killing Excel is dirty, but iirc it takes too much code to do so)
I know this is very old, but I came across a similar situation (did not have have rows of varying columns) today and found my own solution so I thought I would share for anyone else coming into this situation. My solution was to use Get-Content for the first row of the CSV and -split on the delimiter (,) to create an array and then return the count of the array. As mentioned in replies above, this will not account for delimiters existing within quotations.
((Get-Content $PathToCsv)[0] -split ",").count
I had the same issue and went with AAgent suggestion.
$CommaCount = ((Get-Content $PathToCsv)[0] -split ",").count
$SemicolonCount = ((Get-Content $PathToCsv)[0] -split ";").count
if ($CommaCount -gt $SemicolonCount){
$CMSlist = Import-Csv ($PathToCsv) –Delimiter “,”
}
else{
$CMSlist = Import-Csv ($PathToCsv) –Delimiter “;”