How can I replace newlines using PowerShell? - powershell

Given test.txt containing:
test
message
I want to end up with:
testing
a message
I think the following should work, but it doesn't:
Get-Content test.txt |% {$_-replace "t`r`n", "ting`r`na "}
How can I do a find and replace where what I'm finding contains CRLF?

A CRLF is two characters, of course, the CR and the LF. However, `n consists of both. For example:
PS C:\> $x = "Hello
>> World"
PS C:\> $x
Hello
World
PS C:\> $x.contains("`n")
True
PS C:\> $x.contains("`r")
False
PS C:\> $x.replace("o`nW","o There`nThe W")
Hello There
The World
PS C:\>
I think you're running into problems with the `r. I was able to remove the `r from your example, use only `n, and it worked. Of course, I don't know exactly how you generated the original string so I don't know what's in there.

In my understanding, Get-Content eliminates ALL newlines/carriage returns when it rolls your text file through the pipeline. To do multiline regexes, you have to re-combine your string array into one giant string. I do something like:
$text = [string]::Join("`n", (Get-Content test.txt))
[regex]::Replace($text, "t`n", "ting`na ", "Singleline")
Clarification: small files only folks! Please don't try this on your 40 GB log file :)

With -Raw you should get what you expect

If you want to remove all new line characters and replace them with some character (say comma) then you can use the following.
(Get-Content test.txt) -join ","
This works because Get-Content returns array of lines. You can see it as tokenize function available in many languages.

You can use "\\r\\n" also for the new line in powershell. I have used this in servicenow tool.
In my case "\r\n" s not working so i tried "\\r\\n" as "\" this symbol work as escape character in powershell.

You can detect if a file is CRLF with this simple Powershell instruction
(cat -Raw $args) -match "\r\n$"
Replacing \n with \r\n is tricky because you have to detect it first and apply the replacement only if it is needed, otherwise it would be a mess. Too complex.
In any case, you can forget detection, to ensure a file is CRLF whatever the original type you can do this in PowerShell:
cat $file > $file

Related

PowerShell not removing new line characters

Environment: Windows 10 pro 20H2, PowerShell 5.1.19041.1237
In a .txt file, my following PowerShell code is not replacing the newline character(s) with " ". Question: What I may be missing here, and how can we make it work?
C:\MyFolder\Test.txt File:
This is first line.
This is second line.
This is third line.
This is fourth line.
Desired output [after replacing the newline characters with " " character]:
This is first line. This is second line. This is third line. This is fourth line.
PowerShell code:
PS C:\MyFolder\test.txt> $content = get-content "Test.txt"
PS C:\MyFolder\test.txt> $content = $content.replace("`r`n", " ")
PS C:\MyFolder\test.txt> $content | out-file "Test.txt"
Remarks
The above code works fine if I replace some other character(s) in file. For example, if I change the second line of the above code with $content = $content.replace("third", "3rd"), the code successfully replaces third with 3rd in the above file.
You need to pass -Raw parameter to Get-Content. By default, without the Raw parameter, content is returned as an array of newline-delimited strings.
Get-Content "Test.txt" -Raw
Quoting from the documentation,
-Raw
Ignores newline characters and returns the entire contents of a file in one string with the newlines preserved. By default, newline
characters in a file are used as delimiters to separate the input into
an array of strings. This parameter was introduced in PowerShell 3.0.
The simplest way of doing this is to not use the -Raw switch and then do a replacement on it, but make use of the fact that Get-Content splits the content on Newlines for you.
All it then takes is to join the array with a space character.
(Get-Content -Path "Test.txt") -join ' ' | Set-Content -Path "Test.txt"
As for what you have tried:
By using Get-Content without the -Raw switch, the cmdlet returns a string array of lines split on the Newlines.
That means there are no Newlines in the resulting strings anymore to replace and all that is needed is to 'stitch' the lines together with a space character.
If you do use the -Raw switch, the cmdlet returns a single, multiline string including the Newlines.
In your case, you then need to do the splitting or replacing yourself and for that, don't use the string method .Replace, but the regex operator -split or -replace with a search string '\r?\n'.
The question mark in there makes sure you split on newlines in Windows format (CRLF), but also works on *nix format (LF).

Remove the LF from the test file and leave the CRLF (batch, PowerShell)

I have a .txt file in which some of the lines end with LF (there should not be a new line here), and the part ends with CRLF. How can I remove the LF and leave the CRLF?
This is my original file content:
I am trying the following code (doesn't work because it doesn't remove unnecessary line feeds (LF))
(Get-Content $path -Raw).Replace("(?!\r)\n"," ") | Set-Content $path -Force
This is my intended result
For clarity, the intention is therefore, to replace all LF characters, which are not immediately preceded with a CR character, with a single space character.
P.S. Any PowerShell code must be compatible with version 2.0
.replace() doesn't accept regex like -replace does. And you're using negative lookhead (?!) instead of negative lookbehind (?<!). regex101.com is useful for seeing what a regex expression does. With raw strings, you may want the -nonewline option to set-content.
"12345`n56789`n09876`r`n" -replace "(?<!\r)\n", " "
12345 56789 09876
Normally Get-Content
will break and stream lines on any combination of CR, LF or both CRLF.
You might prevent this with the -Delimiter "`r`n" parameter:
Get-Content $InPath -Delimiter "`r`n" |
ForEach-Object Replace "`n", " " |
Set-Content $OutPath
Tested using Powershell version 7.1.3
I suppose you want to do something like this? This is assuming no whitespace actually exists on the end of each line:
#echo off
set "fd="
for /f "delims=" %%a in (input.txt) do call set "fd=%%fd%%%%a "
(echo %fd:~0,-1%)>result.txt
You have both powershell and batch-file tags, hence a batch-file solution.

Can I use Powershell to automatically extract an unknown string with a specific pattern from a XML file and write that string in a text file?

In a XML file with 100 lines of code, there is one string with a specific pattern that I want to find and write into a new text file.
What the string contains is unknown and can vary, but the pattern is the same. For example:
12hi34
99ok45
Those have in common that the length is 6 and element:
0-1: integers
2-3: characters
4-5: integers
Is there a way to use Powershell and write a script that can find the string that fit the pattern and export it in a text file?
I'm new to Powershell and scripting. Tried to Google the problem and stumbled upon Select-String, but that doesn't solve my problem. Hope some of you can guide me here. Thanks.
Edit: The string is outside the root element as some "free text". It is not a traditional XML file.
Assuming there's only one token of interest in the file, and that the letters are limited to English letters 'a' through 'z':
(Get-Content -Raw in.xml) -replace '(?s).*(\d{2}[a-z]{2}\d{2}).*', '$1' > out.txt
Note:
If no matching token is found, the input file's entire content is written to out.txt.
On Windows PowerShell > produces UTF-16LE ("Unicode") files by default (in PowerShell Core it is UTF-8 without a BOM); pipe to Set-Content out.txt -Encoding ... instead to create a file with a different encoding.
Get-Content -Raw reads the entire input file as a single string.
The -replace operator uses regular expressions (regexes) for matching - see this answer for more information.
Inline option (?s) at the start of regex makes . match newlines too.
By default, matching is case-insensitive; use -creplace for case-sensitive matching.
Try this...
$f = Get-Content '<xml-file>' -ReadCount 0
foreach ($l in $f) {
if ($l -match '[0-9]{1,3}[a-zA-Z]{2,3}[0-9]{1,5}') {
Write-Output $matches.0
}
}
Stuffing the contents of a file into a variable. Iterating over each line of the file. Parsing out the value by pattern.
Here is a sample of the matching piece...

How to remove a multi line block of text from $pattern in Powershell

I'm getting the contents of a text file which is partly created by gsutil and I'm trying to put its contents in $body but I want to omit a block of text that contains special characters. The problem is that I'm not able to match this block of text in order for it to be removed. So when I print out $body it still contains all the text that I'm trying to omit.
Here's a part of my code:
$pattern = #"
==> NOTE: You are uploading one or more large file(s), which would run
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this you and any
users that download such composite files will need to have a compiled
crcmod installed (see "gsutil help crcmod").
"#
$pattern = ([regex]::Escape($pattern))
$body = Get-Content -Path C:\temp\file.txt -Raw | Select-String -Pattern $pattern -NotMatch
So basically I need it to display everything inside the text file except for the block of text in $pattern. I tried without -Raw and without ([regex]::Escape($pattern)) but it won't remove that entire block of text.
It has to be because of the special characters, probably the " , . () because if I make the pattern simple such as:
$pattern = #"
NOTE: You are uploading one or more
"#
then it works and this part of text is removed from $body.
It'd be nice if everything inside $pattern between the #" and "# was treated literally. I'd like the simplest solution without functions, etc. I'd really appreciate it if someone could help me out with this.
With the complete text of your question stored in file .\SO_55538262.txt
This script with manually escaped pattern:
$pattern = '(?sm)^==\> NOTE: You .*?"gsutil help crcmod"\)\.'
$body = (Get-Content .\SO_55538262.txt -raw) -replace $pattern
$body
Returns here:
I'm getting the contents of a text file which is partly created by gsutil and I'm trying to put its contents in $body but I want to omit a block of text that contains special characters. The problem is that I'm not able to match this block of text in order for it to be removed. So when I print out $body it still contains all the text that I'm trying to omit.
Here's a part of my code:
$pattern = #"
"#
$pattern = ([regex]::Escape($pattern))
$body = Get-Content -Path C:\temp\file.txt -Raw | Select-String -Pattern $pattern -NotMatch
So basically I need it to display everything inside the text file except for the block of text in $pattern. I tried without -Raw and without ([regex]::Escape($pattern)) but it won't remove that entire block of text.
It has to be because of the special characters, probably the " , . () because if I make the pattern simple such as:
$pattern = #" NOTE: You are uploading one or more "#
then it works and this part of text is removed from $body.
It'd be nice if everything inside $pattern between the #" and "# was treated literally. I'd like the simplest solution without functions, etc.
Explanation of the RegEx from regex101.com:
(?sm)^==\> NOTE: You .*?"gsutil help crcmod"\)\.
(?sm) match the remainder of the pattern with the following effective flags: gms
s modifier: single line. Dot matches newline characters
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
^ asserts position at start of a line
== matches the characters == literally (case sensitive)
\> matches the character > literally (case sensitive)
NOTE: You matches the characters NOTE: You literally (case sensitive)
.*?
. matches any character
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
"gsutil help crcmod" matches the characters "gsutil help crcmod" literally (case sensitive)
\) matches the character ) literally (case sensitive)
\. matches the character . literally (case sensitive)
An easy way to tackle this task (without regex) would be using the -notin operator. Since Get-Content is returning your file content as a string[]:
#requires -Version 4
$set = #('==> NOTE: You are uploading one or more large file(s), which would run'
'significantly faster if you enable parallel composite uploads. This'
'feature can be enabled by editing the'
'"parallel_composite_upload_threshold" value in your .boto'
'configuration file. However, note that if you do this you and any'
'users that download such composite files will need to have a compiled'
'crcmod installed (see "gsutil help crcmod").')
$filteredContent = #(Get-Content -Path $path).
Where({ $_.Trim() -notin $set }) # trim added for misc whitespace
v2 compatible solution:
#(Get-Content -Path $path) |
Where-Object { $set -notcontains $_.Trim() }

How can I remove CRLF if anywhere between double quotes, using PowerShell?

My text file looks like this.
"MikeCRLF","","","Dell","DevelCRLFCRLFoper"CRLF
"SuCRLFsan","","","Apple","ManagCRLFer"CRLF
Desired result:
"Mike","","","Dell","Developer"LF
"Susan","","","Apple","Manager"LF
I tried this on PowerShell:
"C:\Users\abc\Desktop\1.txt"
(Get-Content $path -Raw).Replace("`r`n","`n") | Set-Content $path -Force
When I do this, I don't get the desired result. Also, I am left with one CRLF at the end. I don't want that either.
Please tell me how to do this using PowerShell v3.
This method avoids checking to see if \r\n is in quotes. Instead, it tries to find the "real" end of line situations and converts those first. Then it just purges the rest.
(Get-Content test.txt -Raw) -replace '([^,]")(\s*\r\n\s*)+("[^,])',"`$1`n`$3" -replace '\r\n',''
I think this should handle most of the stuff you throw at it, but let me know if you find a special case.
edited to fix the replacement string
If you are using the PowerShell Community Extensions, you can use the ConvertTo-UnixLineEnding command e.g.:
ConvertTo-UnixLineEnding C:\users\abc\desktop1.txt -dest desktop1-converted.txt -Enc ascii