PowerShell not removing new line characters - powershell

Environment: Windows 10 pro 20H2, PowerShell 5.1.19041.1237
In a .txt file, my following PowerShell code is not replacing the newline character(s) with " ". Question: What I may be missing here, and how can we make it work?
C:\MyFolder\Test.txt File:
This is first line.
This is second line.
This is third line.
This is fourth line.
Desired output [after replacing the newline characters with " " character]:
This is first line. This is second line. This is third line. This is fourth line.
PowerShell code:
PS C:\MyFolder\test.txt> $content = get-content "Test.txt"
PS C:\MyFolder\test.txt> $content = $content.replace("`r`n", " ")
PS C:\MyFolder\test.txt> $content | out-file "Test.txt"
Remarks
The above code works fine if I replace some other character(s) in file. For example, if I change the second line of the above code with $content = $content.replace("third", "3rd"), the code successfully replaces third with 3rd in the above file.

You need to pass -Raw parameter to Get-Content. By default, without the Raw parameter, content is returned as an array of newline-delimited strings.
Get-Content "Test.txt" -Raw
Quoting from the documentation,
-Raw
Ignores newline characters and returns the entire contents of a file in one string with the newlines preserved. By default, newline
characters in a file are used as delimiters to separate the input into
an array of strings. This parameter was introduced in PowerShell 3.0.

The simplest way of doing this is to not use the -Raw switch and then do a replacement on it, but make use of the fact that Get-Content splits the content on Newlines for you.
All it then takes is to join the array with a space character.
(Get-Content -Path "Test.txt") -join ' ' | Set-Content -Path "Test.txt"
As for what you have tried:
By using Get-Content without the -Raw switch, the cmdlet returns a string array of lines split on the Newlines.
That means there are no Newlines in the resulting strings anymore to replace and all that is needed is to 'stitch' the lines together with a space character.
If you do use the -Raw switch, the cmdlet returns a single, multiline string including the Newlines.
In your case, you then need to do the splitting or replacing yourself and for that, don't use the string method .Replace, but the regex operator -split or -replace with a search string '\r?\n'.
The question mark in there makes sure you split on newlines in Windows format (CRLF), but also works on *nix format (LF).

Related

How to compare file on local PC and github?

I have a file on my PC called test.ps1
I have a file hosted on my github called test.ps1
both of them have the same contents a string inside them
I am using the following script to try and comapare them:
$fileA = Get-Content -Path "C:\Users\User\Desktop\test.ps1"
$fileB = (Invoke-webrequest -URI "https://raw.githubusercontent.com/repo/Scripts/test.ps1")
if(Compare-Object -ReferenceObject $fileA -DifferenceObject ($fileB -split '\r?\n'))
{"files are different"}
Else {"Files are the same"}
echo ""
Write-Host $fileA
echo ""
Write-Host $fileB
however my output is showing the exact same data for both but it says the files are different. The output:
files are different
a string
a string
is there some weird EOL thing going on or something?
tl;dr
# Remove a trailing newline from the downloaded file content
# before splitting into lines.
# Parameter names omitted for brevity.
Compare-Object $fileA ($fileB -replace '\r?\n\z' -split '\r?\n' )
If the files are truly identical (save for any character-encoding and newline-format differences, and whether or not the local file has a trailing newline), you'll see no output (because Compare-Object only reports differences by default).
If the lines look the same, it sounds like character encoding is not the problem, though it's worth pointing out that Get-Content in Windows PowerShell, in the absence of a BOM, assumes that a file is ANSI-encoded, so a UTF-8 file without BOM that contains characters outside the ASCII range will be misinterpreted - use -Encoding utf8 to fix that.
Assuming that the files are truly identical (including not having variations in whitespace, such as trailing spaces at the end of lines), the likeliest explanation is that the file being retrieved has a trailing newline, as is typical for text files.
Thus, if the downloaded file has a trailing newline, as is to be expected, if you apply -split '\r?\n' to the multi-line string representing the entire file content in order to split it into lines, you'll end up with an extra, empty array element at the end, which causes Compare-Object to report that element as a difference.
Compare-Object emitting an object is evaluated as $true in the implied Boolean context of your if statement's conditional, which is why files are different is output.
The above -replace operation, -replace '\r?\n\z' (\z matches the very end of a (multi-line) string), compensates for that, by removing the trailing newline before splitting into lines.

Remove the LF from the test file and leave the CRLF (batch, PowerShell)

I have a .txt file in which some of the lines end with LF (there should not be a new line here), and the part ends with CRLF. How can I remove the LF and leave the CRLF?
This is my original file content:
I am trying the following code (doesn't work because it doesn't remove unnecessary line feeds (LF))
(Get-Content $path -Raw).Replace("(?!\r)\n"," ") | Set-Content $path -Force
This is my intended result
For clarity, the intention is therefore, to replace all LF characters, which are not immediately preceded with a CR character, with a single space character.
P.S. Any PowerShell code must be compatible with version 2.0
.replace() doesn't accept regex like -replace does. And you're using negative lookhead (?!) instead of negative lookbehind (?<!). regex101.com is useful for seeing what a regex expression does. With raw strings, you may want the -nonewline option to set-content.
"12345`n56789`n09876`r`n" -replace "(?<!\r)\n", " "
12345 56789 09876
Normally Get-Content
will break and stream lines on any combination of CR, LF or both CRLF.
You might prevent this with the -Delimiter "`r`n" parameter:
Get-Content $InPath -Delimiter "`r`n" |
ForEach-Object Replace "`n", " " |
Set-Content $OutPath
Tested using Powershell version 7.1.3
I suppose you want to do something like this? This is assuming no whitespace actually exists on the end of each line:
#echo off
set "fd="
for /f "delims=" %%a in (input.txt) do call set "fd=%%fd%%%%a "
(echo %fd:~0,-1%)>result.txt
You have both powershell and batch-file tags, hence a batch-file solution.

How to efficiently delete the last line of a multiline string when the line is empty/blank?

I'm trying to delete blank line at the bottom from the each sqlcmd output files, provided other vendor.
$List=Get-ChildItem * -include *.csv
foreach($file in $List) {
$data = Get-Content $file
$name = $file.name
$length = $data.length -1
$data[$length] = $null
$data | Out-File $name -Encoding utf8
}
It takes bit long time to remove the blank line. Anyone knows a more efficient way?
Using Get-Content -Raw to load files as a whole, as a single string into memory and operating on that string will give you the greatest speed boost.
While that isn't always an option depending on file size, you mention sqlcmd files, which can be assumed to be small enough.
Note:
By blank line I mean a line that is either completely empty or contains whitespace (other than newlines) only.
The trimmed string will not have a final terminating newline following the last line, but if you pass it to Set-Content (or Out-File), one will be appended by default; use -NoNewline to suppress that, but not that especially on Unix-like platforms even the last line of text files is expected to have a trailing newline.
Trailing (or leading) whitespace on a non-blank line is by design not trimmed, except where noted.
The solutions use the -replace operator, which operates on regexes (regular expressions).
Remove all trailing blank lines:
Note: If you really want to remove only the last line if it happens to be blank, see the second-to-last solution below.
(Get-Content -Raw $file) -replace '\r?\n\s*$'
In the context of your command (slightly modified):
Get-ChildItem -Filter *.sqlcmd | ForEach-Object {
(Get-Content -Raw $_.FullName) -replace '\r?\n\s*$' |
Set-Content $_.FullName -Encoding utf8 -WhatIf # save back to same file
}
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
If it's acceptable / desirable to also trim trailing whitespace from the last non-blank line, you can more simply write:
(Get-Content -Raw $file).TrimEnd()
Remove all blank lines, wherever they occur in the file:
(Get-Content -Raw $file) -replace '(?m)\A\s*\r?\n|\r?\n\s*$'
Here's a conceptually much simpler version that operates on the array of lines output by Get-Content without -Raw (and also returns an array), but it performs much worse.
#(Get-Content $file) -notmatch '^\s*$'
Do not combine this with Set-Content / Out-Content -NoNewline, as that will concatenate the lines stored in the array elements directly, without line breaks between them. Without -NoNewline, you'll invariably get a terminating newline after the last line.
Remove only the last line if it is blank:
(Get-Content -Raw $file) -replace '\r?\n[ \t]*\Z'
Note:
[ \t] matches spaces and tabs, whereas \s more generally matches all forms of Unicode whitespace, including that outside the ASCII range.
An optional trailing newline at the very end of the file (to terminate the last line) is not considered a blank line in this case - whether such a newline is present or not does not make a difference.
Unconditionally remove the last line, whether it is blank or not:
(Get-Content -Raw $file) -replace '\r?\n[^\n]*\Z'
Note:
An optional trailing newline at the very end of the file (to terminate the last line) is not considered a blank line in this case - whether such a newline is present or not does not make a difference.
If you want to remove the last non-blank line, use
(Get-Content -Raw $file).TrimEnd() -replace '\r?\n[^\n]*\Z'
try replacing with this line. you will not have blank lines in your array value $data.
$data = get-content $file.FullPath | Where-Object {$_.trim() -ne "" }

Can I use Powershell to automatically extract an unknown string with a specific pattern from a XML file and write that string in a text file?

In a XML file with 100 lines of code, there is one string with a specific pattern that I want to find and write into a new text file.
What the string contains is unknown and can vary, but the pattern is the same. For example:
12hi34
99ok45
Those have in common that the length is 6 and element:
0-1: integers
2-3: characters
4-5: integers
Is there a way to use Powershell and write a script that can find the string that fit the pattern and export it in a text file?
I'm new to Powershell and scripting. Tried to Google the problem and stumbled upon Select-String, but that doesn't solve my problem. Hope some of you can guide me here. Thanks.
Edit: The string is outside the root element as some "free text". It is not a traditional XML file.
Assuming there's only one token of interest in the file, and that the letters are limited to English letters 'a' through 'z':
(Get-Content -Raw in.xml) -replace '(?s).*(\d{2}[a-z]{2}\d{2}).*', '$1' > out.txt
Note:
If no matching token is found, the input file's entire content is written to out.txt.
On Windows PowerShell > produces UTF-16LE ("Unicode") files by default (in PowerShell Core it is UTF-8 without a BOM); pipe to Set-Content out.txt -Encoding ... instead to create a file with a different encoding.
Get-Content -Raw reads the entire input file as a single string.
The -replace operator uses regular expressions (regexes) for matching - see this answer for more information.
Inline option (?s) at the start of regex makes . match newlines too.
By default, matching is case-insensitive; use -creplace for case-sensitive matching.
Try this...
$f = Get-Content '<xml-file>' -ReadCount 0
foreach ($l in $f) {
if ($l -match '[0-9]{1,3}[a-zA-Z]{2,3}[0-9]{1,5}') {
Write-Output $matches.0
}
}
Stuffing the contents of a file into a variable. Iterating over each line of the file. Parsing out the value by pattern.
Here is a sample of the matching piece...

stripping extra text qualifier from a CSV - part 2

For part 1, see this SO post
I have a CSV that has certain fields separated by the " symbol as a TextQualifier.
See below for example. Note that each integer (eg. 1,2,3 etc) is supposed to be a string. the qualified strings are surrounded by the " symbol.
1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedString2""
Notice how the last qualified string has a " symbol as part of the string.
User #mjolinor suggested this powershell script, which works to fix the above scenario, but it does not fix the "Part 2" scenario below.
(get-content file.txt -ReadCount 0) -replace '([^,]")"','$1' |
set-content newfile.txt
Here is part 2 of the question. I need a solution for this:
The extra " symbol can appear randomly in the string. Here's another example:
1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedS"tring2"
Can you suggest an elegant way to automate the cleaning of the CSV to eliminate redundant " qualifiers?
You just need a different regex:
(get-content file.txt -ReadCount 0) -replace '(?<!,)"(?!,|$)',''|
set-content newfile.txt
That one will replace any double quote that is not immediately preceeded by a comma, or followed by either a comma or the end of the line.
$text = '1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedS"tring2"'
$text -replace '(?<!,)"(?!,|$)',''
1,2,3,"qualifiedString1",4,5,6,7,8,9,10,11,12,13,14,15,16,"qualifiedString2"