Split text after each end of line [duplicate] - powershell

This question already has answers here:
Powershell: split string with newline, then use -contains
(3 answers)
Closed 1 year ago.
I have a script that works perfectly fine with Powershell 5.x, but does not work anymore on Powershell Core (7.2.1)
The problem happens when I try to split a text (copy&past from an email)..
It all comes down to this part of the code:
$test="blue
green
yellow
"
#$test.Split([Environment]::NewLine)
$x = $test.Split([Environment]::NewLine)
$x[0]
$x[1]
In Powershell 5 the value for $x[0]==blue and $x[1]==green
But in Powershell Core the split doesn't do anything and $x[1] is "non existent".
In Powershell 7 the line breaks are handled differently (that's at least what I assume), but I couldn't find a solution to it..
I tried it with changing the code to
$rows = $path.split([Environment]::NewLine) and $path.Split([System.Environment]::NewLine, [System.StringSplitOptions]::RemoveEmptyEntries) but that doesn't change anything..
Also, when I use a "here-string"
$test = #'
green
yellow
blue
white
'#
$x= $test -split "`r`n", 5, "multiline"
Everything excepts $x[0] is empty (i.e $x[2])
I was already looking here: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_split?view=powershell-7.2
And here: powershell -split('') specify a new line
And here: WT: Paste multiple lines to Windows Terminal without executing
So far I have not found a solution to my problem.
Any help is appreciated.
EDIT: I found a hint about that problem, but don't understand the implications of it yet: https://n-v-o.github.io/2021-06-10-String-Method-in-Powershell-7/
EDIT 2:
Thanks everyone for participating in answering my question.
First I thought I'm going to write a long explanation why my question is different then the duplicated answer from #SantiagoSquarzon. But while reading the answers to my question and the other question I noticed I was doing something differently..
Apparently there is something differnt when I use
$splits = $test -split '`r?`n' # doesn't work in 5.1 and 7.2.1
$splits = $test -split '\r?\n' # works in 5.1 and 7.2.1 as suggested from Santiago and others
BUT
$splits = $test.Split("\r?\n") # doesn't work in 5.1 and 7.2.1
$splits = $test.Split("`r?`n") # doesn't work in 5.1 and 7.2.1
$splits = $test.Split([char[]]"\r\n") # doesnt' work in 7.2.1
$splits = $test.Split([char[]]"`r`n") # works in 7.2.1

tl;dr:
Use -split '\r?\n to split multiline text into lines irrespective of whether Windows-format CRLF or Unix-format LF newlines are used (it even handles a mix of these formats in a single string).
If you additionally want to handle CR-only newlines (which would be unusual, but appears to be the case for you), use -split '\r?\n|\r'
On Windows, with CRLF newlines only, .Split([Environment]::NewLine) only works as intended in PowerShell (Core) 7+, not in Windows PowerShell (and, accidentally, in Windows PowerShell only with CR-only newlines, as in your case.) To explicitly split by CR only, .Split("`r") would happen to work as intended in both editions, due to splitting by a single character only.
# Works on both Unix and Windows, in both PowerShell editions.
# Input string contains a mix of CRLF and LF and CR newlines.
"one`r`ntwo`nthree`rfour" -split '\r?\n|\r' | ForEach-Object { "[$_]" }
Output:
[one]
[two]
[three]
[four]
This is the most robust approach, as you generally can not rely on input text to use the platform-native newline format, [Environment]::NewLine; see the bottom section for details.
Note:
The above uses PowerShell's -split operator, which operates on regexes (regular expressions), which enables the flexible matching logic shown above.
This regex101.com page explains the \r?\n|\r regex and allows you to experiment with it.
By contrast, the System.String.Split() .NET method only splits by literal strings, which, while faster, limits you to finding verbatim separators.
The syntax implications are:
Regex constructs such as escape sequences \r (CR) and \n (LF) are only supported by the .NET regex engine and therefore only by -split (and other PowerShell contexts where regexes are being used); ditto for regex metacharacters ? (match the preceding subexpression zero or one time) and | (alternation; match the subexpression on either side).
Inside strings (which is how regexes must be represented in PowerShell, preferably inside '...'), these sequences and characters have no special meaning, neither to PowerShell itself nor to the .Split() method, which treats them all verbatim.
By contrast, the analogous escape sequences "`r" (CR) and "`n" (LF) are PowerShell features, available in expandable strings, i.e. they work only inside "..." - not also inside verbatim strings, '...' - and are expanded to the characters they represent before the target operator, method, or command sees the resulting string.
This answer discusses -split vs. .Split() in more depth and recommends routine use of -split.
As for what you tried:
Use [Environment]::NewLine only if you are certain that the input string uses the platform-native newline format. Notably, multiline string literals entered interactively at the PowerShell prompt use Unix-format LF newlines even on Windows (the only exception is the obsolescent Windows-only ISE, which uses CRLF).
String literals in script files (*.ps1) use the same newline format that the script is saved in - which may or may not be the platform's format.
Additionally, as you allude to in your own answer, the addition of a string parameter overload in the System.String.Split() method in .NET Core / .NET 5+ - and therefore PowerShell (Core) v6+ - implicitly caused a breaking change relative to Windows PowerShell: specifically, .Split('ab') splits by 'a' or 'b' - i.e. by any of the individual characters that make up the string - in Windows PowerShell, whereas it splits by the whole string, 'ab', in PowerShell (Core) v6+.
Such implicit breaking changes are rare, but they do happen, and they're outside PowerShell's control.
For that reason, you should always prefer PowerShell-native features for long-term stability, which in this case means preferring the -split operator to the .Split() .NET method.
That said, sometimes .NET methods are preferable for performance reasons; you can make them work robustly, but only if carefully match the exact data types of the method overloads of interest, which may require cast; see below.
See this answer for more information, including a more detailed explanation of the implicit breaking change.
Your feedback on -split '\r?\n' not working for you and the solutions in your own answer suggest that your input string - unusually - uses CR-only newlines.
Your answer's solutions would not work as expected with Windows-format CRLF-format text, because splitting would happen for each CR and LF in isolation, which would result in extra, empty elements in the output array (each representing the empty string "between" a CRLF sequence).
If you did want to split by [Environment]::NewLine on Windows - i.e. by CRLF - and you wanted to stick with the .Split() method, in order to make it work in Windows PowerShell too, you'd need to call the overload that expects a [string[]] argument, indicating that each string (even if only one) is to be used as a whole as the separator - as opposed to splitting by any of its individual characters:
# On Windows, split by CRLF only.
# (Would also work on Unix with LF-only text.)
# In PowerShell (Core) 7+ only, .Split([Environment]::NewLine) would be enough.
"one`r`ntwo`r`nthree".Split([string[]] [Environment]::NewLine, [StringSplitOptions]::None) |
ForEach-Object { "[$_]" }
Output:
[one]
[two]
[three]
While this is obviously more ceremony than using -split '\r?\n', it does have the advantage of performing better - although that will rarely matter. See the next section for a generalization of this approach.
Using an unambiguous .Split() call for improved performance:
Note:
This is only necessary if -split '\r?\n' or -split '\r?\n|\r' turns out to be too slow in practice, which won't happen often.
To make this work robustly, in both PowerShell editions as well as long-term, you must carefully match the exact data types of the .Split() overload of interest.
The command below is the equivalent of -split '\r?\n|\r', i.e. it matches CRLF, LF, and CR newlines. Adapt the array of strings for more restrictive matching.
# Works on both Unix and Windows, in both PowerShell editions
"one`r`ntwo`nthree`rfour".Split(
[string[]] ("`r`n", "`n", "`r"),
[StringSplitOptions]::None
) | ForEach-Object { "[$_]" }

The reason: When pasting text into the terminal, it matters which terminal you are using. The default powershell 5.1, ISE terminals, and most other Windows software separates new lines with both carriage return \r and newline \n characters. We can check by converting to bytes:
# 5.1 Desktop
$test = "a
b
c"
[byte[]][char[]]$test -join ','
97,13,10,98,13,10,99
#a,\r,\n, b,\r,\n, c
Powershell Core separates new lines with only a newline \n character
# 7.2 Core
$test = "a
b
c"
[byte[]][char[]]$test -join ','
97,10,98,10,99
On Windows OS, [Environment]::NewLine is \r\n no matter which console. On Linux, it is \n.
The solution: split multiline strings on either \r\n or \n (but not on only \r). The easy way here is with regex like #Santiago-squarzon suggests:
$splits = $test -split '\r?\n'
$splits[0]
a
$splits[1]
b

Thanks to this site I found a solution:
https://n-v-o.github.io/2021-06-10-String-Method-in-Powershell-7/
In .NET 4, the string class only had methods that took characters as
parameter types. PowerShell sees this and automagically converts it,
to make life a little easier on you. Note there’s an implied ‘OR’ (|)
here as it’s an array of characters.
Why is PowerShell 7 behaving differently? In .NET 5, the string class
has some additional parameters that accept strings. PowerShell 7 does
not take any automatic action.
In order to fix my problem, I had to use this:
$test.Split("`r").Split("`n") #or
$test.Split([char[]]"`r`n")

Related

Why does backtick in Set-Content not create multiple lines?

Both of these commands doesn't create multiple line text:
Set-Content .\test.md 'Hello`r`nWorld'
Set-Content .\test.md 'Hello\r\nWorld'
Only this can
Set-Content .\test.md #("Hello`nWorld")
Do you know why is that?
Escape sequences such as `r`n only work inside "...", i.e, expandable (interpolating) strings.
By contrast, '...' strings are verbatim strings that do not interpret their contents - even ` instances are used as verbatim (literally).
Only ` (the so-called backtick) serves as the escape character in PowerShell, not \.
That is, in both "..." and '...' strings a \ is a literal.
(However, \ is the escape character in the context of regexes (regular expressions), but it is then the .NET regex engine that interprets them, not PowerShell; e.g.,
"`r" -match '\r' is $true: the (interpolated) literal CR char. matched its escaped regex representation).
As for what you tried:
It is the fact that "Hello`nWorld" in your last command is a "..." string that made it work.
By contrast, enclosing the string in #(...), the array-subexpression operator, is incidental to the solution. (Set-Content's (positionally implied) -Value parameter is array-valued anyway (System.Object[]), so even a single string getting passed is coerced to an array).
Finally, note that Set-Content by default adds a trailing, platform-native newline to the output file; use -NoNewLine to suppress that, but note that doing so also places no newline between the (string representations of) multiple input objects, if applicable (in your case there's only one).
Therefore (note the -NoNewLine and the trailing `n):
Set-Content -NoNewLine .\test.md "Hello`nWorld`n"
Optional reading: design rationale for PowerShell's behavior:
Why doesn't PowerShell use the backslash (\) as the escape character, like other languages?
Because PowerShell must (also) function on Windows (it started out as Windows-only), use of \ as the escape character - as known from Unix (POSIX-compatible) shells such as Bash - is not an option, given that \ is used as the path separator on Windows.
If \ were the escape character, you'd have to use Get-ChildItem C:\\Windows\\System32 instead of Get-ChildItem C:\Windows\System32, which is obviously impractical in a shell, where dealing with file-system paths is very common.
Thus, a different character had to be chosen, which turned out to be `, the so-called backtick: At least on US keyboards, it is easy to type (just like \), and it has the benefit of occurring rarely (as itself) in real-world strings, so that the need to escape it rarely arises.
Note that the much older legacy shell on Windows, cmd.exe, too had to pick a different character: it chose ^, the so-called caret.
Why doesn't it use single quote and double quote interchangeably, like other languages?
Different languages made different design choices, but in making "..." strings interpolating, but '...' strings not, PowerShell did follow existing languages here, namely that of POSIX-compatible shells such as Bash.
As an improvement on the latter PowerShell also supports embedding verbatim ' inside '...', escaped as '' (e.g., '6'' tall')
Given PowerShell's commitment to backward compatibility, this behavior won't change, especially given how fundamental it is to the language.
Conceptually speaking, you could argue that the aspect of what quoting character a string uses should be separate from whether it is interpolating, so that you'd be free to situationally choose one or the other quoting style for syntactic convenience, while separately controlling whether interpolation should occur.
Thus, hypothetically, PowerShell could have used a separate sigil to make a string interpolating, say $"..." and $'...' (similar to what C# now offers, though it notably only has one string-quoting style).
(As an aside: Bash and Ksh do have this syntax form, but it serves a different purpose (localization of strings) and is rarely used in pratice).
In practice, however, once you know how "..." and '...' work in PowerShell, it isn't hard to make them work as intended.
See this answer for a juxtaposition of PowerShell, cmd.exe, and POSIX-compatible shells with respect to fundamental features.

Is there a way to set the default EOL separator for Out-File in PowerShell 5.1?

I am looking for a way to set the default EOL marker to 0x0A when writing text with Out-File.
On the internet, I found tons of examples that either replace 0x0D 0x0A after a file is already written, or -join the lines on 0x0A and then write the concatenated text into the file.
I find both approaches a bit clumsy as I'd just like to write the files with the redirection operator >.
So, is there a way to set the EOL style in PowerShell 5.1?
No, unfortunately, as of PowerShell 7.2, there is no way to make PowerShell use a different newline (EOL, line break) format.
It is the platform-native newline character or sequence (as reflected in [Environment]::NewLine) that is invariably used - both for separating multiple input objects and for the trailing newline by default.
To control the newline format, you need to:
Join the input objects explicitly with the newline character (sequence) of choice, using the -join operator, followed by another instance if a trailing newline is desired ...
... and use the -NoNewLine switch of Set-Content / Out-File (in lieu of >) so as to prevent appending of a trailing platform-native newline.
As for potential future enhancements:
GitHub feature request #2872 suggests adding a parameter to Set-Content, specifically, to allow specifying the newline format; the request has been green-lighted (a long time ago), but has yet to be implemented - however, I think it isn't comprehensive enough, and it wouldn't help with Out-File / >; see next point.
GitHub feature request #3855 more generally asked for a -Delimiter parameter (to mirror Get-Content's existing parameter by that name) to be added to Set-Content / Add-Content, Out-File and Out-String
Unfortunately, the proposal was rejected; if it hadn't, you would have been able to configure > - a virtual alias of Out-File - to use LF-only newlines, for instance, as follows:
# WISHFUL THINKING
$PSDefaultParameterValues['Out-File:Delimiter'] = "`n"

Powershell Regex Multiline parsing

I am working on building a script that will analyze a configuration file (cisco switch config) and build a report based on certain findings. Sadly- the findings must be recorded on a specific form so this painful path is my only option outside of manual generation of each form.
What I'm trying to do:
Using the following I am attempting to pull the following multi-line expression into PS for evaluation
interface vlan1
no ip address
shutdown
!
I have found multiple sources that point towards one of two options- the first (and simplest) being to load the file into Get-content using the "-raw" switch in order to evaluate the entire file as a single string and then use the "select-string" command to output the specific information that I am looking for.
My basic code looks something like this
if (get-content -path U:\Testing\Test.txt -Raw | select-string -Pattern "(?ms)interface vlan1.*no ip address.*(?!no shutdown)shutdown.*\!" -Quiet)
{
write-host('pass')
}
else
{
write-host('fail')
}
Expected outcome: if the string is true- I will append the finding to a file (that part I have already)
If the configuration does not contain "shutdown" exclusively (without the word no) then it will be annotated as such (again I have that process as well)
Thank you in advance for your assistance- hopefully this is clear and concise.
Further clarity: the script returns false positives/negatives. when running the get-content + select-string outside of the if command- I basically get the -raw output but no match on the string itself, leading me to believe that I am having an issue with the start of line (interface vlan1) and the end line (!)
I have played with the structure of the regex string to try and tease out a solution but I am still a bit vague as to the usage of multi-line output while using select-string.
Since you need to look at the file in full, there's no reason to use the Select-String cmdlet, given that -match, the regular-expression matching operator, works more effectively on strings that are already in memory.
Note: -match only every finds one match (if any); if this is not sufficient, use the [regex]::Matches() .NET method; it is unfortunate that there's no operator for multiple matches; GitHub issue #7867 proposes introducing one, named -matchall.
Your regex is too permissive (greedy) due to use of .* across lines due to the (?s) matching option, so matching happens across multiple blocks.
The following uses a regex without .*, and instead explicitly matches the lines in full, including explicit matching of intervening newlines (\r?\n).[1]
This works with your sample input, but you may need to tweak the regex (omitting the (?s) option makes .* match only intra-line; expressions can be made non-greedy by modifying a duplication symbol with ? (e.g. .*?)).
$re = '(?m)^interface vlan1\r?\nno ip address\r?\n(?!no shutdown)shutdown\r?\n!'
if ((Get-Content U:\Testing\Test.txt -Raw) -match $re) {
# ...
}
Note: The assumption is that there's no need to validate that the trailing ! is the only character on its line; if that is needed, append (?:\r?\n|\z).[2]
[1] This regex matches both common newline formats: CRLF (\r\n, Windows) and LF (\n, Unix).
[2] Unfortunately, use of $ to assert the end of a line (with the (?m) option in effect) may not work if the input uses CRLF (\r\n) newlines, because the $ matches the position of a LF character (\n) only, which means that $ does not match immediately after !, due to the intervening \r.

ConvertTo-Json and ConvertFrom-Json with special characters

I have a file containing some properties which value of some of them contains escape characters, for example some Urls and Regex patterns.
When reading the content and converting back to the json, with or without unescaping, the content is not correct. If I convert back to json with unescaping, some regular expression break, if I convert with unescaping, urls and some regular expressions will break.
How can I solve the problem?
Minimal Complete Verifiable Example
Here are some simple code blocks to allow you simply reproduce the problem:
Content
$fileContent =
#"
{
"something": "http://domain/?x=1&y=2",
"pattern": "^(?!(\\`|\\~|\\!|\\#|\\#|\\$|\\||\\\\|\\'|\\\")).*"
}
"#
With Unescape
If I read the content and then convert the content back to json using following command:
$fileContent | ConvertFrom-Json | ConvertTo-Json | %{[regex]::Unescape($_)}
The output (which is wrong) would be:
{
"something": "http://domain/?x=1&y=2",
"pattern": "^(?!(\|\~|\!|\#|\#|\$|\||\\|\'|\")).*"
}
Without Unescape
If I read the content and then convert the content back to json using following command:
$fileContent | ConvertFrom-Json | ConvertTo-Json
The output (which is wrong) would be:
{
"something": "http://domain/?x=1\u0026y=2",
"pattern": "^(?!(\\|\\~|\\!|\\#|\\#|\\$|\\||\\\\|\\\u0027|\\\")).*"
}
Expected Result
The expected result should be same as the input file content.
I decided to not use Unescape, instead replace the unicode \uxxxx characters with their string values and now it works properly:
$fileContent =
#"
{
"something": "http://domain/?x=1&y=2",
"pattern": "^(?!(\\`|\\~|\\!|\\#|\\#|\\$|\\||\\\\|\\'|\\\")).*"
}
"#
$fileContent | ConvertFrom-Json | ConvertTo-Json | %{
[Regex]::Replace($_,
"\\u(?<Value>[a-zA-Z0-9]{4})", {
param($m) ([char]([int]::Parse($m.Groups['Value'].Value,
[System.Globalization.NumberStyles]::HexNumber))).ToString() } )}
Which generates the expected output:
{
"something": "http://domain/?x=1&y=\\2",
"pattern": "^(?!(\\|\\~|\\!|\\#|\\#|\\$|\\||\\\\|\\'|\\\")).*"
}
If you don't want to rely on Regex (from #Reza Aghaei's answer), you could import the Newtonsoft JSON library. The benefit is the default StringEscapeHandling property which escapes control characters only. Another benefit is avoiding the potentially dangerous string replacements you would be doing with Regex.
This StringEscapeHandling is also the default handling of PowerShell Core (version 6 and up) because they started to use Newtonsoft internally since then. So another alternative would be to use ConvertFrom-Json and ConvertTo-Json from PowerShell Core.
Your code would look something like this if you import the Newtonsoft JSON library:
[Reflection.Assembly]::LoadFile("Newtonsoft.Json.dll")
$json = Get-Content -Raw -Path file.json -Encoding UTF8 # read file
$unescaped = [Newtonsoft.Json.Linq.JObject]::Parse($json) # similar to ConvertFrom-Json
$escapedElementValue = [Newtonsoft.Json.JsonConvert]::ToString($unescaped.apiName.Value) # similar to ConvertTo-Json
$escapedCompleteJson = [Newtonsoft.Json.JsonConvert]::SerializeObject($unescaped) # similar to ConvertTo-Json
Write-Output "Variable passed = $escapedElementValue"
Write-Output "Same JSON as Input = $escapedCompleteJson"
Note:
Applying [regex]::Unescape() isn't called for, as JSON's escaping is unrelated to regex escaping.
That is, $fileContent | ConvertFrom-Json | ConvertTo-Json should work as-is, but doesn't due to a quirk in Windows PowerShell, which caused the & in your input string to be represented as its equivalent escape sequence on re-conversion, \u0026; the quirk similarly affects ' (\u0026), < (\u003c) and > (\u003e).
tl;dr
The problem does not affect PowerShell (Core) 6+ (the install-on-demand, cross-platform PowerShell edition), which uses a different implementation of the ConvertTo-Json and ConvertFrom-Json cmdlets, namely, as of PowerShell 7.2.x, one based on Newtonsoft.JSON (whose direct use is shown in r3verse's answer). There, your sample roundtrip command works as expected.
Only ConvertTo-Json in Windows PowerShell is affected (the bundled-with-Windows PowerShell edition whose latest and final version is 5.1). But note that the JSON representation - while unexpected - is technically correct.
A simple, but robust solution focused only on unescaping those Unicode escape sequences that ConvertTo-Json unexpectedly creates - namely for & ' < > - while ruling out false positives:
# The following sample JSON with undesired Unicode escape sequences for `& < > '`
# was created with Windows PowerShell's ConvertTo-Json as follows:
# ConvertTo-Json "Ten o'clock at <night> & later. \u0027 \\u0027"
$json = '"Ten o\u0027clock at \u003cnight\u003e \u0026 later. \\u0027 \\\\u0027"'
[regex]::replace(
$json,
'(?<=(?:^|[^\\])(?:\\\\)*)\\u(00(?:26|27|3c|3e))',
{ param($match) [char] [int] ('0x' + $match.Groups[1].Value) },
'IgnoreCase'
)
The above outputs the desired JSON representation, without the unnecessary escaping of &, ', <, and >, and without having falsely replaced the escaped substrings \\u0027 and \\\\u0027:
"Ten o'clock at <night> & later. \\u0027 \\\\u0027"
Background information:
ConvertTo-Json in Windows PowerShell unexpectedly represents the following ASCII-range characters by their Unicode escape sequences in JSON strings:
& (Unicode escape sequence: \u0026)
' (\u0027)
< and > (\u003c and \u003e)
There's no good reason to do so (these characters only require escaping in HTML/XML text).
However, any compliant JSON parser - including ConvertFrom-Json - converts these escape sequences back to the characters they represent.
In other words: While the JSON text created by Windows PowerShell's ConvertTo-Json is unexpected and can impede readability, it is technically correct and - while not identical - equivalent to the original representation in terms of the data it represents.
Fixing the readability problem:
As an aside: While [regex]::Unescape(), whose purpose is to unescape regexes only, also converts Unicode escape sequences to the characters they represent, it is fundamentally
unsuited to selectively unescaping Unicode sequences JSON strings, given that all other \ escapes must be preserved in order for the JSON string to remain syntactically valid.
While your answer works well in general, it has limitations (aside from the easily corrected problem that a-zA-Z should be a-fA-F to limit matching to those letters that are valid hex. digits):
It doesn't rule out false positives, such as \\u0027 or \\\\u0027 (\\ escapes \, so that the u0027 part becomes a verbatim string and must not be treated as an escape sequence).
It converts all Unicode escape sequences, which presents two problems:
Escape sequences representing characters that require escaping would also be converted to the verbatim character representations, which would break the JSON representations with \u005c, for instance, given that the character it represents, \, requires escaping.
For non-BMP Unicode characters that must be represented as pairs of Unicode escape sequences (so-called surrogate pairs), your solution would mistakenly try to unescape each half of the pair separately.
For a robust solution that overcomes these limitations, see this answer
(surrogate pairs are left as Unicode escape sequences, Unicode escape sequences
whose characters require escaping are converted to \-based (C-style) escapes, such as \n, if possible).
However, if the only requirement is to unescape those Unicode escape sequences
that Windows PowerShell's ConvertTo-Json unexpectedly creates, the solution at the top is sufficient.

powershell -split('') specify a new line

Get-Content $user| Foreach-Object{
$user = $_.Split('=')
New-Variable -Name $user[0] -Value $user[1]}
Im trying to work on a script and have it split a text file into an array, splitting the file based on each new line
What should I change the "=" sign to
It depends on the exact encoding of the textfile, but [Environment]::NewLine usually does the trick.
"This is `r`na string.".Split([Environment]::NewLine)
Output:
This is
a string.
The problem with the String.Split method is that it splits on each character in the given string. Hence, if the text file has CRLF line separators, you will get empty elements.
Better solution, using the -Split operator.
"This is `r`na string." -Split "`r`n" #[Environment]::NewLine, if you prefer
You can use the String.Split method to split on CRLF and not end up with the empty elements by using the Split(String[], StringSplitOptions) method overload.
There are a couple different ways you can use this method to do it.
Option 1
$input.Split([string[]]"`r`n", [StringSplitOptions]::None)
This will split on the combined CRLF (Carriage Return and Line Feed) string represented by `r`n. The [StringSplitOptions]::None option will allow the Split method to return empty elements in the array, but there should not be any if all the lines end with a CRLF.
Option 2
$input.Split([Environment]::NewLine, [StringSplitOptions]::RemoveEmptyEntries)
This will split on either a Carriage Return or a Line Feed. So the array will end up with empty elements interspersed with the actual strings. The [StringSplitOptions]::RemoveEmptyEntries option instructs the Split method to not include empty elements.
The answers given so far consider only Windows as the running environment. If your script needs to run in a variety of environments (Linux, Mac and Windows), consider using the following snippet:
$lines = $input.Split(
#("`r`n", "`r", "`n"),
[StringSplitOptions]::None)
There is a simple and unusual way to do this.
$lines = [string[]]$input
This will split $input like:
$input.Split(#("`r`n", "`n"))
This is undocumented at least in docs for Conversions.
Beware, this will not remove empty entries.
And it doesn't work for Carriage Return (\r) line ending at least on Windows.
Experimented in Powershell 7.2.
This article also explains a lot about how it works with carriage return and line ends. https://virot.eu/powershell-and-newlines/
having some issues with additional empty lines and such i found the solution to understanding the issue. Excerpt from virot.eu:
So what makes up a new line. Here comes the tricky part, it depends.
To understand this we need to go to the line feed the character.
Line feed is the ASCII character 10. It in most programming languages
escaped by writing \n, but in powershell it is `n. But Windows is not
content with just one character, Windows also uses carriage return
which is ASCII character 13. Escaped \r. So what is the difference?
Line feed advances the pointer down one row and carriage return
returns it to the left side again. If you store a file in Windows by
default are linebreaks are stored as first a carriage return and then
a line feed (\r\n). When we aren’t using any parameters for the
split() command it will split on all white-space characters, that is
both carriage return, linefeed, tabs and a few more. This is why we
are getting 5 results when there is both carriage return and line
feeds.