Remove start and end spaces in specific csv column - powershell

I am trying to remove start and end spaces in column data in CSV file. I've got a solution to remove all spaces in the csv, but it's creating non-readable text in description column.
Get-Content –path test.csv| ForEach-Object {$_.Trim() -replace "\s+" } | Out-File -filepath out.csv -Encoding ascii
e.g.
'192.168.1.2' ' test-1-TEST' 'Ping Down at least 1 min' '3/11/2017' 'Unix Server' 'Ping' 'critical'
'192.168.1.3' ' test-2-TEST' ' Ping Down at least 3 min' '3/11/2017' 'windows Server' 'Ping' 'critical'
I only want to remove space only from ' test-1-TEST' and not from 'Ping Down at least 1 min'. Is this possible?

"IP","ServerName","Status","Date","ServerType","Test","State"
"192.168.1.2"," test-1-TEST","Ping Down at least 1 min","3/11/2017","Unix Server","Ping","critical"
"192.168.1.3"," test-2-TEST"," Ping Down at least 3 min","3/11/2017","windows Server","Ping","critical"
For example file above:
Import-Csv C:\folder\file.csv | ForEach-Object {
$_.ServerName = $_.ServerName.trim()
$_
} | Export-Csv C:\folder\file2.csv -NoTypeInformation
Replace ServerName with the name of the Column you want to remove spaces from (aka trim).

If your CSV does not have header (which means its not a true CSV) and/or you want to better preserve the original file structure and formatting you could try to expand on your regex a little.
(Get-Content c:\temp\test.txt -Raw) -replace "(?<=')\s+(?=[^' ])|(?<=[^' ])\s+(?=')"
That should remove all leading and trailing spaces inside the quoted values. Not the delimeters themselves.
Read the file in as one string. Could be bad idea depending on file size. Not required as the solution is not dependent on that. Can still be read line be line with the same transformation achieving the same result. Use two replacements that are similar. First is looking for spaces that exist after a single quote but not followed by another quote or space. Second is looking for spaces before a quote that are not preceded by a quote or space.
Just wanted to give a regex example. You can look into this with more detail and explanation at regex101.com. There you will see an alternation pattern instead of two separate replacements.
(Get-Content c:\temp\test.txt -Raw) -replace "(?<=')\s+(?=[^' ])|(?<=[^' ])\s+(?=')"
The first example is a little easier on the eyes.
I was having issues consistently replicating this but if you are having issues with it replacing newlines as well then you can just do the replacement one line at a time and that should work as well.
(Get-Content c:\temp\test.txt) | Foreach-Object{
$_ -replace "(?<=')\s+(?=[^' ])|(?<=[^' ])\s+(?=')"
} | Set-Content c:\temp\test.txt

Related

Issues merging multiple CSV files in Powershell

I found a nifty command here - http://www.stackoverflow.com/questions/27892957/merging-multiple-csv-files-into-one-using-powershell that I am using to merge CSV files -
Get-ChildItem -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv .\merged\merged.csv -NoTypeInformation -Append
Now this does what it says on the tin and works great for the most part. I have 2 issues with it however, and I am wondering if there is a way they can be overcome:
Firstly, the merged csv file has CRLF line endings, and I am wondering how I can make the line endings just LF, as the file is being generated?
Also, it looks like there are some shenanigans with quote marks being added/moved around. As an example:
Sample row from initial CSV:
"2021-10-05"|"00:00"|"1212"|"160477"|"1.00"|"3.49"LF
Same row in the merged CSV:
"2021-10-05|""00:00""|""1212""|""160477""|""1.00""|""3.49"""CRLF
So see that the first row has lost its trailing quotes, other fields have doubled quotes, and the end of the row has an additional quote. I'm not quite sure what is going on here, so any help would be much appreciated!
For dealing with the quotes, the cause of the “problem” is that your CSV does not use the default field delimiter that Import-CSV assumes - the C in CSV stands for comma, and you’re using the vertical bar. Add the parameter -Delimiter "|" to both the Import-CSV and Export-CSV cmdlets.
I don’t think you can do anything about the line-end characters (CRLF vs LF); that’s almost certainly operating-system dependent.
Jeff Zeitlin's helpful answer explains the quote-related part of your problem well.
As for your line-ending problem:
As of PowerShell 7.2, there are no PowerShell-native features that allow you to control the newline format of file-writing cmdlets such as Export-Csv.
However, if you use plain-text processing, you can use multi-line strings built with the newline format of interest and save / append them with Set-Content and its -NoNewLine switch, which writes the input strings as-is, without a (newline) separator.
In fact, to significantly speed up processing in your case, plain-text handling is preferable, since in essence your operation amounts to concatenating text files, the only twist being that the header lines of all but the first file should be skipped; using plain-text handling also bypasses your quote problem:
$tokenCount = 1
Get-ChildItem -Filter *.csv |
Get-Content -Raw |
ForEach-Object {
# Get the file content and replace CRLF with LF.
# Include the first line (the header) only for the first file.
$content = ($_ -split '\r?\n', $tokenCount)[-1].Replace("`r`n", "`n")
$tokenCount = 2 # Subsequent files should have their header ignored.
# Make sure that each file content ends in a LF
if (-not $content.EndsWith("`n")) { $content += "`n" }
# Output the modified content.
$content
} |
Set-Content -NoNewLine ./merged/merged.csv # add -Encoding as needed.

Powershell: I need to clean a set of csv files, there are an inconsistent number of garbage rows above the headers that must go before import

I have a set of CSV files that I need to import data from, the issue I'm running into is that the number of garbage rows above the header line, and their content, is always different. The header rows themselves are consistent so i could use that to detect what the starting point should be.
I'm not quite sure where to start, the files are structured as below.
Here there be garbage.
So much garbage, between 12 and 25 lines of it.
Header1,Header2,Header3,Header4,Header5
Data1,Data2,Data3,Data4,Data5
My assumption on the best method to do this would be to do something that checks for the line number of the header row and then a get-content function specifying the starting line number be the result of the preceding check.
Any guidance would be most appreciated.
If the header line is as you say consistent, you could do something like this:
$header = 'Header1,Header2,Header3,Header4,Header5'
# read the file as single multiline string
# and split on the escaped header line
$data = ((Get-Content -Path 'D:\theFile.csv' -Raw) -split [regex]::Escape($header), 2)[1] |
ConvertFrom-Csv -Header $($header -split ',')
As per your comment you really only wanted to do a clean-up on these files instead of importing data from it (your question says "I need to import data"), all you have to do is append this line of code:
$data | Export-Csv -Path 'D:\theFile.csv' -NoTypeInformation
The line ConvertFrom-Csv -Header $($header -split ',') parses the data into an array of objects (re)using the headerline that was taken off by the split.
A pure textual approach (without parsing of the data) still needs to write out the headerline, because by splitting the file content of this removed it from the resulting array:
$header = 'Header1,Header2,Header3,Header4,Header5'
# read the file as single multiline string
# and split on the escaped header line
$data = ((Get-Content -Path 'D:\theFile.csv' -Raw) -split [regex]::Escape($header), 2)[1]
# rewrite the file with just the header line
$header | Set-Content -Path 'D:\theFile.csv'
# then write all data lines we captured in variable $data
$data | Add-Content -Path 'D:\theFile.csv'
To offer a slightly more concise (and marginally more efficient) alternative to Theo's helpful answer, using the -replace operator:
If you want to import the malformed CSV file directly:
(Get-Content -Raw file.csv) -replace '(?sm)\A.*(?=^Header1,Header2,Header3,Header4,Header5$)' |
ConvertFrom-Csv
If you want to save the cleaned-up data back to the original file (adjust -Encoding as needed):
(Get-Content -Raw file.csv) -replace '(?sm)\A.*(?=^Header1,Header2,Header3,Header4,Header5$)' |
Set-Content -NoNewLine -Encoding utf8 file.csv
Explanation of the regex:
(?sm) sets the following regex options: single-line (s: make . match newlines too) and multi-line (m: make ^ and $ also match the start and end of individual lines inside a multi-line string).
\A.* matches any (possibly empty) text (.*) from the very start (\A) of the input string.
(?=...) is a positive lookahead assertion that matches the enclosed subexpression (symbolized by ... here) without consuming it (making it part of what the regex considers the matching part of the string).
^Header1,Header2,Header3,Header4,Header5$ matches the header line of interest, as a full line.

Replacing characters in the middle of a line in a text file with PowerShell

I have the code below that I use to replace a string within a text file and it works. However; I have run into the problem where the string I want to replace is not always exactly the same, but the first few characters are. I want to find the first few characters, count 3 characters ahead, and replace that with what I want.
For example, the line I am replacing may be 123xxx, or 123aaa, etc. but the value I am replacing it with is always going to be known. How would I go about replacing the 123xxx when I won't always know what the xxx is?
((Get-Content -path $ConfPath -Raw) -replace $OldVersion,$NewVersion) | Set-Content -Path $ConfPath
As -replace uses regex, you need to escape the characters that have special meaning like in your case the dot (any character in regex).
$OldVersion = 'jre1\.8\.0_\d{3}' # backslash escape the dots. \d{3} means 3 digits
$NewVersion = 'jre1.8.0_261' # this is just a string to replace with; not regex
((Get-Content -path $ConfPath -Raw) -replace $OldVersion,$NewVersion) | Set-Content -Path $ConfPath
I researched it and figured it out using regex. The code below does exactly what I wanted to do:
((Get-Content -path $ConfPath -Raw) -replace 'jre1.8.0_...',$NewVersion) | Set-Content -Path $ConfPath

Batch File to Find and Replace in text file using whole word only?

I am writing a script which at one point has to check in a text file and remove certain strings. So far I have this:
powershell -Command "(gc myFile.txt) -replace 'foo', 'bar' | Out-File -encoding ASCII myFile.txt"
The only problem is that that can find and replace but will not remove the line all together.
The second problem is that say I am removing the line that has Mark, it needs to not remove a line that has something like Markus.
I don't know if this is possible with the powershell interface?
Your current code will only replace foo with bar, this is what replace does.
Removing the whole line if it matches requires a different approach, almost backwards, as you can use notmatch to output any lines that do not match you filter - effectively removing them.
Also using regex word boundaries will then only match Mark but not Markus:
(Get-Content file.txt) | Where-Object {$_ -notmatch "\bMark\b"} | Set-Content file.txt

Matching lines in file from list

I have two text files, Text1.txt and Text2.txt
Text2.txt is a list of keywords, one keyword per line. I want to read from Text1.txt and any time a keyword in the Text2.txt list shows up, pipe that entire line of text to a new file, output.txt
Without using Text2.txt I figured out how to do it manually in PowerShell.
Get-Content .\Text1.txt | Where-Object {$_ -match 'CAPT'} | Set-Content output.txt
That seems to work, it searches for "CAPT" and returns the entire line of text, but I don't know how to replace the manual text search with a variable that pulls from Text2.txt
Any ideas?
Using some simple regex you can make a alternative matching string from all the keywords in the file Text2.txt
$pattern = (Get-Content .\Text2.txt | ForEach-Object{[regex]::Escape($_)}) -Join "|"
Get-Content .\Text1.txt | Where-Object {$_ -match $pattern} | Set-Content output.txt
In case your keywords have special regex characters we need to be sure they are escaped here. The .net regex method Escape() handles that.
This is not an efficient approach for large files but it is certainly a simple method. If your keywords were all similar like CAPT CAPS CAPZ then we could improve it but I don't think it would be worth it depending how often the keywords change.
Changing the pattern
If you wanted to just match the first 4 characters from the lines in your input file that is just a matter of making a change in the loop.
$pattern = (Get-Content .\Text2.txt | ForEach-Object{[regex]::Escape($_.Substring(0,4))}) -Join "|"