Remove commas from numbers in a CSV - powershell

I have folder info for all user folders. It is dumped out to a CSV file as follows:
Servername, F:\Users\user, 9,355.7602 MB, 264, 3054, 03/15/2000 13:28:48, 12/10/2018 11:58:29
We are unable to work with the data as is due to the thousands separator in the 3rd column. I could run the report scripts again, but we have a lot of file servers and a large number of users on one in particular, so running it again is very time consuming. The reason the commas are there is that the data was written as a string not a number.
I can import and convert, the only problem is that any number over 1000 will be wrong and then all other data is 1 column off. I would like to replace any comma between 2 numbers. It doesn't seem it would be that hard to do with PowerShell, but I am not having any luck finding anything.

If you assume that columns of data are comma plus space separated and your numbers have no spaces, you can use the -replace operator for this.
$line = 'Servername, F:\Users\user, 9,355.7602 MB, 264, 3054, 03/15/2000 13:28:48, 12/10/2018 11:58:29'
$line -replace '(?<=\d),(?=\d)'
If you are reading the data from a file, you can read the data with Get-Content, replace your data, and update the file with Set-Content.
(Get-Content file.csv) -replace '(?<=\d),(?=\d)' | Set-Content file.csv
If the file is large, you can utilize the faster switch statement.
$data = switch -regex -file file.csv {
'(?<=\d),(?=\d)' { $_ -replace '(?<=\d),(?=\d)' }
default {$_}
}
$data | Set-Content file.csv
Explanation:
(?<=\d) uses a positive lookbehind assertion (?<=) that matches a single digit \d.
(?=\d) uses a positive lookahead assertion (?=) that matches a single digit. You could replace this with (?=\d{3}) to match 3 consecutive digits after the comma.
Since you want to replace the target comma with empty string, you do not need a replacement string.
Typically, it would be best to stick with commands that work with CSV data or files. However, if your data contains commas and you aren't qualifying your text, it may be difficult to distinguish between data and delimiters. If you have a clear way of making that distinction, you are better off using ConvertFrom-Csv for already read data or Import-Csv for files. You will need to define headers either in the files or in the command.

EDIT
It was my oversight that the , in the dataset is not delimited, which causes this answer to not work as expected as the comma is seen as a column separator when parsing the CSV. I'm going to leave it as it does explain how to generally manipulate the data as you'd expect, if the column data were escaped property. However, #AdminOfThings' answer below should work for your specific case here, and will fix the erroneous defined column without relying on parsing the CSV content as a CSV first.
Import the data using Import-Csv, then remove any , in the third column. This assumes that you have no values where , is the decimal separator:
If you have headers in the CSV, you won't need to define header names or get fancy with writing the CSV back out:
Import-Csv -Path \path\to\file.csv | Foreach-Object {
$_.ColumnName = $_.ColumnName -replace ','
} | Export-Csv -NoTypeInformation -Path \path\to\file.csv
The way this works is that we import the CSV as an operable PSCustomObject, then for each line we take whatever the column name with the size is and remove the , from it. Finally, we export the modified PSCustomObject back out to the original CSV.
If you don't have headers, it gets a little trickier since we have to define temporary headers, but Export-Csv doesn't have an option to skip writing out headers:
Import-Csv -Path \path\to\file.csv -Headers Col1, Col2, Col3, Col4, Col5, Col6, Col7 |
Foreach-Object {
$_.Col3 = $_.Col3 -replace ','
} | ConvertTo-Csv | Select-Object -Skip 1 |
Set-Content -Path \path\to\file.csv
This does the same thing as the first block of code, but since we don't want to export the temporary headers, we have to get creative. First, note we reference the target column with the temporary header name. Instead of piping the modified CSV object right to Export-Csv, first we want to convert the object to CSV using ConvertTo-Csv. We then use Select-Object to skip the first line of the converted CSV text, which is the header, so we just have the row data and column values. Finally, we use Set-Content to write the CSV text without the header back to the original file.

Related

Use PowerShell to see if a column is empty and the delete the entire row from csv file

I have a csv file, with no headlines, that looks like this:
"88212526";"Starter";"PowerMax";"4543";"5713852369748";"146,79";"EUR";"6"
"88212527";"Starter";"PowerMax";"4543";"5713852369755";"66,88";"EUR";"20"
"88212530";"Starter";"PowerMax";"4543";"5713852369786";"143,27";"EUR";"0"
"88212532";"Starter";"PowerMax";"4543";"5713852369809";"80,98";"EUR";"6"
"88212536";"Starter";"PowerMax";"4543";"5713852369847";"";"EUR";"0"
"88212542";"Starter";"PowerMax";"4543";"5713852369908";"77,16";"EUR";"9"
"88212543";"Starter";"PowerMax";"4543";"5713852369915";"77,46";"EUR";"52"
I need a script in PowerShell that deletes the entire row if column 6 is empty.
I have tried this
Foreach ($line in Get-Content .\POWERMAX_DK_1.csv) {
$linearray = $line.split(";")
if($linearray[6] -ne "") {
Add-Content .\myTempFile.csv $line
}
}
But it don't work. The line with empty column is not removed.
Please help
/Kim
Your immediate problem is twofold:
As Mauro Takeda's answer points out, to access the 6th element, you must use index 5, given that array indices are 0-based.
Since you're reading your CSV file as plain text, the field you're looking for has verbatim content "", i.e. including the double quotes, so you'd have to use -ne '""' instead of -ne "" ($linearray[5])
However, it's worth changing your approach:
Use Import-Csv to import your CSV file, which in your case requires manually supplying headers (column names) with the -Header parameter.
This outputs objects whose properties are named for the columns, and whose property values have the syntactic " delimiters removed.
These properties can then be used to robustly filter the input with the Where-Object cmdlet.
In order to convert the results back to a CSV file, use a - single -call to Export-Csv, as shown below (see next point).
Using Add-Content in a loop body is ill-advised for performance reasons, because the file has to be opened and closed in every iteration; instead, pipe to a single call of a file-writing cmdlet - see this answer for background information.
Therefore:
# Note: The assumption is that there are 8 columns, as shown in the sample data.
# Adjust as needed.
Import-Csv .\POWERMAX_DK_1.csv -Delimiter ';' -Header (1..8) |
Where-Object 6 -ne '' |
Export-Csv -NoTypeInformation \myTempFile.csv
Character-encoding caveat: In Windows PowerShell, Export-Csv uses ASCII(!) by default; PowerShell (Core) 7+ commendably uses BOM-less UTF-8. Use the -Encoding parameter as needed.
If you need check column 6, you have to use $linearray[5], because arrays starts counting on zero ($linearray[0] should be the first element)

Issues merging multiple CSV files in Powershell

I found a nifty command here - http://www.stackoverflow.com/questions/27892957/merging-multiple-csv-files-into-one-using-powershell that I am using to merge CSV files -
Get-ChildItem -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv .\merged\merged.csv -NoTypeInformation -Append
Now this does what it says on the tin and works great for the most part. I have 2 issues with it however, and I am wondering if there is a way they can be overcome:
Firstly, the merged csv file has CRLF line endings, and I am wondering how I can make the line endings just LF, as the file is being generated?
Also, it looks like there are some shenanigans with quote marks being added/moved around. As an example:
Sample row from initial CSV:
"2021-10-05"|"00:00"|"1212"|"160477"|"1.00"|"3.49"LF
Same row in the merged CSV:
"2021-10-05|""00:00""|""1212""|""160477""|""1.00""|""3.49"""CRLF
So see that the first row has lost its trailing quotes, other fields have doubled quotes, and the end of the row has an additional quote. I'm not quite sure what is going on here, so any help would be much appreciated!
For dealing with the quotes, the cause of the “problem” is that your CSV does not use the default field delimiter that Import-CSV assumes - the C in CSV stands for comma, and you’re using the vertical bar. Add the parameter -Delimiter "|" to both the Import-CSV and Export-CSV cmdlets.
I don’t think you can do anything about the line-end characters (CRLF vs LF); that’s almost certainly operating-system dependent.
Jeff Zeitlin's helpful answer explains the quote-related part of your problem well.
As for your line-ending problem:
As of PowerShell 7.2, there are no PowerShell-native features that allow you to control the newline format of file-writing cmdlets such as Export-Csv.
However, if you use plain-text processing, you can use multi-line strings built with the newline format of interest and save / append them with Set-Content and its -NoNewLine switch, which writes the input strings as-is, without a (newline) separator.
In fact, to significantly speed up processing in your case, plain-text handling is preferable, since in essence your operation amounts to concatenating text files, the only twist being that the header lines of all but the first file should be skipped; using plain-text handling also bypasses your quote problem:
$tokenCount = 1
Get-ChildItem -Filter *.csv |
Get-Content -Raw |
ForEach-Object {
# Get the file content and replace CRLF with LF.
# Include the first line (the header) only for the first file.
$content = ($_ -split '\r?\n', $tokenCount)[-1].Replace("`r`n", "`n")
$tokenCount = 2 # Subsequent files should have their header ignored.
# Make sure that each file content ends in a LF
if (-not $content.EndsWith("`n")) { $content += "`n" }
# Output the modified content.
$content
} |
Set-Content -NoNewLine ./merged/merged.csv # add -Encoding as needed.

Reformat column names in a csv with PowerShell

Question
How do I reformat an unknown CSV column name according to a formula or subroutine (e.g. rename column " Arbitrary Column Name " to "Arbitrary Column Name" by running a trim or regex or something) while maintaining data?
Goal
I'm trying to more or less sanitize columns (the names) in a hand-produced (or at least hand-edited) csv file that needs to be processed by an existing PowerShell script. In this specific case, the columns have spaces that would be removed by a call to [String]::Trim(), or which could be ignored with an appropriate regex, but I can't figure a way to call or use those techniques when importing or processing a CSV.
Short Background
Most files and columns have historically been entered into the CSV properly, but recently a few columns were being dropped during processing; I determined it was because the files contained a space (e.g., Select-Object was being told to get "RFC", but Import-CSV retrieved "RFC ", so no matchy-matchy). Telling the customer to enter it correctly by hand (though preferred and much simpler) is not an option in this case.
Options considered
I could manually process the text of the file, but that is a messy and error prone way to re-invent the wheel. I wonder if there's a syntax with Select-Object that would allow a softer match for column names, but I can't find that info.
The closest I have come conceptually is using a calculated property in the call to Select-Object to rename the column, but I can only find ways to rename a known column to another known column. So, this would require enumerating the columns and matching them exactly (preferred) or a softer match (like comparing after trimming or matching via regex as a fallback) with expected column names, then creating a collection of name mappings to use in constructing calculated properties from that information to select into a new object.
That seems like it would work, but more it's work than I'd prefer, and I can't help but hope that there's a simpler way I haven't been able to find via Google. Maybe I should try Bing?
Sample File
Let's say you have a file.csv like this:
" RFC "
"1"
"2"
"3"
Code
Now try to run the following:
$CSV = Get-Content file.csv -First 2 | ConvertFrom-Csv
$FixedHeaders = $CSV.PSObject.Properties.Name.Trim(' ')
Import-Csv file.csv -Header $FixedHeaders |
Select-Object -Skip 1 -Property RFC
Output
You will get this output:
RFC
---
1
2
3
Explanation
First we use Get-Content with parameter -First 2 to get the first two lines. Piping to ConvertFrom-Csv will allow us to access the headers with PSObject.Properties.Name. Use Import-Csv with the -Header parameter to use the trimmed headers. Pipe to Select-Object and use -Skip 1 to skip the original headers.
I'm not sure about comparisons in terms of efficiency, but I think this is a little more hardened, and imports the CSV only once. You might be able to use #lahell's approach and Get-Content -raw, but this was done and it works, so I'm gonna leave it to the community to determine which is better...
#import the CSV
$rawCSV = Import-Csv $Path
#get actual header names and map to their reformatted versions
$CSVColumns = #{}
$rawCSV |
Get-Member |
Where-Object {$_.MemberType -eq "NoteProperty"} |
Select-Object -ExpandProperty Name |
Foreach-Object {
#add a mapping to the original from a trimmed and whitespace-reduced version of the original
$CSVColumns.Add(($_.Trim() -replace '(\s)\s+', '$1'), "$_")
}
#Create the array of names and calculated properties to pass to Select-Object
$SelectColumns = #()
$CSVColumns.GetEnumerator() |
Foreach-Object {
$SelectColumns += {
if ($CSVColumns.values -contains $_.key) {$_.key}
else { #{Name = $_.key; Expression = $CSVColumns[$_.key]} }
}
}
$FormattedCSV = $rawCSV |
Select-Object $SelectColumns
This was hand-copied to a computer where I don't have the rights to run it, so there might be an error - I tried to copy it correctly
You can use gocsv https://github.com/DataFoxCo/gocsv to see the headers of the csv, you can then rename the headers, behead the file, swap columns, join, merge, any number of transformations you want

Export comma-separated text file to csv and maintain leading zeros

I have 3 .txt files that each need to be converted into .csv files. Each file has 12 columns and some of these columns have data with leading zeroes. These zeroes need to remain. Is there a way through PowerShell to write a loop that will export each of these to a .csv and maintain the leading zeros?
The closest thing I could do was to export them one at a time, but this doesn't maintain the trailing zeros that I need.
Import-Csv C:\AcctsLog.txt -Delimiter ";" | Export-Csv C:\AcctsLog.csv
A sample line would be something like:
Joe Smith;1933 Test Lane;Apt 34;Los Angeles;CA;90003-3444;0000000023;0002;New Car;SmithJoe#yahoo.com;00934200034006700213;0000666666
See if this works with your data:
Import-Csv C:\AcctsLog.txt -Delimiter ';' -Header (1..12) |
ConvertTo-Csv -NoTypeInformation | select -Skip 1 |
Set-Content C:\AcctsLog.csv
If you explicitly want it to include the leading 0's in Excel you would have to save it as an Excel file (otherwise Excel strips leading zeros off values that it interprets as numbers when opening a CSV). You could paste the data into Excel after formatting the cells as Text, then save the files as excel files. But if you want CSV files then go with mjolinor's answer since it produces CSV files with the leading zeros, exactly like you asked for.
To work with Excel you have to create an Excel ComObject. Then you can get the content of your file, replace the semicolons with tabs, pipe to Clip, and paste right into Excel (after creating a workbook and formatting the 12 columns that you need). Should be pretty simple:
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $true
$FileList = #("C:\Temp\AcctsLog.txt","C:\Temp\SecondFile.txt","C:\Temp\ThirdFile.txt")
ForEach($File in $FileList){
[void]$Excel.Workbooks.Add()
$Excel.ActiveSheet.Range("A:L").NumberFormat = '#'
(Get-Content $File) -replace ';', "`t" | Clip
$Excel.ActiveSheet.Paste()
$Excel.ActiveWorkbook.SaveAs(($File -replace "txt$","xlsx"))
$Excel.ActiveWorkbook.Close($false)
}
$Excel.Quit()
There is a simple way to maintain the leading zeroes in Excel.
Simply add this to the cell and type whatever value you need and the zeroes will be retained
For ex: If I want 0000000023
Type into a cell '0000000023
That ' symbol seems to retain the zeroes as long as you type it before the values.

PowerShell: Read text, regex sort, write output to file and formatting

I am a Powershell novice and have run into a challenge in reading, sorting, and outputting a csv file. The input csv has no headers, the data is as follows:
05/25/2010,18:48:33,Stop,a1usak,10.128.212.212
05/25/2010,18:48:36,Start,q2uhal,10.136.198.231
05/25/2010,18:48:09,Stop,s0upxb,10.136.198.231
I use the following piping construct to read the file, sort and output to a file:
(Get-Content d:\vpnData\u62gvpn2.csv) | %{,[regex]::Split($, ",")} | sort #{Expression={$[3]}},#{Expression={$_[1]}} | out-file d:\vpnData\u62gvpn3.csv
The new file is written with the following format:
05/25/2010
07:41:57
Stop
a0uaar
10.128.196.160
05/25/2010
12:24:24
Start
a0uaar
10.136.199.51
05/25/2010
20:00:56
Stop
a0uaar
10.136.199.51
What I would like to see in the output file is a similar format to the original input file with comma dilimiters:
05/25/2010,07:41:57,Stop,a0uaar,10.128.196.160
05/25/2010,12:24:24,Start,a0uaar,10.136.199.51
05/25/2010,20:00:56,Stop,a0uaar,10.136.199.51
But I can't quite seem to get there. I'm almost of the mind that I'll have to write another segment to read the newly produced file and reset its contents to the preferred format for further processing.
Thoughts?
So you want to sort on the fourth and second columns, then write out a csv file?
You can use import-csv to suck the file into memory, specifying the column names with the -header argument. The export-csv command, however, will write a header row out to the destination file and wrap the values in double-quotes, which you probably don't want.
This works, though:
import-csv -header d,t,s,n,a test.csv |
sort n,t |
%{write-output ($_.d + "," + $_.t + "," + $_.s + "," + $_.n + "," + $_.a) }
(I've wrapped it onto multiple lines for readability.)
If you redirect the output of that back to a file, it should do what you want.
You can also use the ConvertFrom-CSV in a similar way
ConvertFrom-Csv -Header date, time, status,user,ip #"
05/25/2010,18:48:33,Stop,a1usak,10.128.212.212
05/25/2010,18:48:36,Start,q2uhal,10.136.198.231
05/25/2010,18:48:09,Stop,s0upxb,10.136.198.231
"#