(PowerShell) split string with escaped separator characters - powershell

The split module is often used to split Active Directory Distinguished Names and Canonical Names to RDNs conveniently forgetting about the escaped separator characters that might be used in OUs and CNs as:
Distinguished Name Example with an escaped comma:
CN=Test User,OU=Comma\,Test,OU=Test,DC=domain,DC=com
Canonical Name Example with an escaped slash:
Domain.com/Test/Slash\/Test/Test User
There are several splitting examples on the internet that do not even mention this trap which might work for a long time but sooner or later will cause a lot of pain troubleshooting this programming flaw .
I don’t think there is an easy way to correctly split escaped strings using a Regular Expression (see also: Is there a pure regex split of a string containing escape sequences?).
.

Using negative lookbehind:
$text = 'CN=Test User,OU=Comma\,Test,OU=Test,DC=domain,DC=com'
$text -split '(?<!\\),'
CN=Test User
OU=Comma\,Test
OU=Test
DC=domain
DC=com
$text = 'Domain.com/Test/Slash\/Test/Test User'
$text -split '(?<!\\)/'
Domain.com
Test
Slash\/Test
Test User

To summarize and complement the existing, helpful answers:
mjolinor's answer works well if you needn't worry about \\ appearing in the input as an escaped \.
If \\ were present, the solution would misinterpret the , in \\, as escaped (rather than an escaped \ followed by an unescaped ,).
iRon's own answer addresses that problem with a more sophisticated regex.
Additionally, you may want to remove the escape characters after splitting; building on a regex provided by Wiktor Stribiżew here and adding a -replace operation with regex \\(.):
PS> 'foo,bar\,baz,bang\\,last' -split '(?<=(?<!\\)(?:\\\\)*),' -replace '\\(.)', '$1'
foo
bar,baz
bang\
last
Here's a simple utility function that wraps the above, with a configurable separator and escape char.:
function Split-Text {
param(
[Parameter(Mandatory=$True)] [string] $Text,
[Parameter(Mandatory=$True)] [string] $Separator,
[string] $EscapeChar = '\'
)
$Text -split
('(?<=(?<!{0})(?:{0}{0})*){1}' -f [regex]::Escape($EscapeChar), [regex]::Escape($Separator)) `
-replace ('{0}(.)' -f [regex]::Escape($EscapeChar)), '$1'
}
# Sample call - yields the same as above.
Split-Text 'foo,bar\,baz,bang\\,last' ','
# With "/" as the separator - analogous output.
Split-Text 'foo/bar\/baz/bang\\/last' '/'

I think there is still a little trap as RNDs could potentially end with a backslash (which will be escaped with an additional backslash):
$text = 'CN=Test User,OU=EndSlash\\,OU=Comma\,Test,DC=domain,DC=com'
$text -split '(?<!\\),'
CN=Test User
OU=EndSlash\\,OU=Comma\,Test
DC=domain
DC=com
In other words the concerned separator should only be skipped if there is an odd number of backslashes in front of it.
To cover this, I think the complete regular expressions should be:
(?<![^\\](\\\\)*\\), (for Distinguished Names) and
(?<![^\\](\\\\)*\\)/ (for Canonical Names).
$text = 'CN=Test User,OU=EndSlash\\,OU=Comma\,Test,DC=domain,DC=com'
$text -split '(?<![^\\](\\\\)*\\),'
CN=Test User
OU=EndSlash\\
OU=Comma\,Test
DC=domain
DC=com
$text = 'Domain.com/Slash\/Test/EndSlash\\/Test/Test User'
$text -split '(?<![^\\](\\\\)*\\)/'
Domain.com
Slash\/Test
EndSlash\\
Test
Test User

Therefore I have created a little cmdlet that adds an escape feature to the existing split module:
Function Split {
param(
[Parameter(Mandatory = $True, ValueFromPipeline = $true)][String]$String,
[Parameter(Mandatory = $False, Position = 0)][String]$Delimiter = " ",
[Parameter(Mandatory = $False, Position = 1)][Int]$MaxSubstrings = 0,
[Parameter(Mandatory = $False, Position = 2)][String]$Escape,
[Parameter(Mandatory = $False, Position = 3)][String]$Options = ""
)
If ($Escape) {$String = $String.Replace("$Escape$Delimiter", [String][Char]27)}
$Split = $String -Split $Delimiter, $MaxSubstrings, $Options
If ($Escape) {$Split | ForEach {$_.Replace([String][Char]27, "$Escape$Delimiter")}} Else {$Split}
}
"CN=Test User,OU=Comma\,Test,OU=Test,DC=domain,DC=com" | Split "," -Escape "\"
"Domain.com/Test/Slash\/Test/Test User" | Split "/" -Escape "\"

Related

Replace string with line break with another string in Powershell

I want to replace
$fieldTool.GetFieldValue($i
tem,"Title")
with
{{(sc_get_field_value i_item 'Title')}}
The original string has a line break and I am using 'n like this $fieldTool.GetFieldValue($i'ntem,"Title")
This is the code
$template = '<div class="tile-inspiration__title field-title">$fieldTool.GetFieldValue($i
tem,"Title")</div>'
$matchString = '$fieldTool.GetFieldValue($i'ntem,"Title")'
$pattern = $([regex]::escape($matchString))
$replaceString = "{{(sc_get_field_value i_item 'Title')}}"
$newtemplate = $template -replace $pattern, $replaceString
Write-Host $newtemplate
The above code is not working. How can I replace the string with line break with another string.
Any suggestion would be appreciated.
Thanks in advance
To replace newlines, you should use regex pattern \r?\n. This will match both *nix as well as Windows newlines.
In your template string however, there are multiple characters that have special meaning in regex, therefore you need to do [regex]::Escape(), but that also would wrongfully escape the characters \r?\n, rendering it as \\r\?\\n, so adding that in the $matchString before escaping it, would be of no use.
You can manually first replace the newline with a character that otherwise is not present in the $matchString and has no special meaning in regex.
$template = '<div class="tile-inspiration__title field-title">$fieldTool.GetFieldValue($i
tem,"Title")</div>'
# for demo, I chose to replace the newline with an underscore
$matchString = '$fieldTool.GetFieldValue($i_tem,"Title")'
# now, escape the string and after that replace the underscore by the wanted \r?\n pattern
$pattern = [regex]::escape($matchString) -replace '_', '\r?\n'
# $pattern is now: \$fieldTool\.GetFieldValue\(\$i\r?\ntem,"Title"\)
$replaceString = "{{(sc_get_field_value i_item 'Title')}}"
# this time, the replacement should work
$newtemplate = $template -replace $pattern, $replaceString
Write-Host $newtemplate # --> <div class="tile-inspiration__title field-title">{{(sc_get_field_value i_item 'Title')}}</div>

PowersShell UpperCase Replace

Ok... after 99 different combinations...
I want to replace in thousands of files all occurences of EnumMessage.something to EnumMessage.SOMETHING so uppercase the second word. Which may be standalone or followed by a dot or followed by a (
$output = 'EnumMessage.test(something) and EnumMessage.Tezt andz EnumMessage.ALREAdY. ' -creplace 'EnumMessage\.(\w+)', 'EnumMessage.$1.ToUpper()'
$output
So the above places the function Upper there (the word) but it does not upper the second word.
In PowerShell 6 and later, the -replace operator also accepts a script block that performs the replacement. The script block runs once for every match.
In PowerShell 5, apply the Regex.Replace Method.
$string = 'EnumMessage.test(something) and EnumMessage.Tezt andz EnumMessage.ALREAdY. '
$pattern = '(?<=EnumMessage\.)(\w+)'
# (?<=EnumMessage\.) = positive lookbehind
if ( $PSVersionTable.PSVersion.Major -ge 6 ) {
$string -replace $pattern, { $_.Value.ToUpper() }
} else {
[regex]::Replace( $string, $pattern, { $args.Value.ToUpper() } )
}
There are definitely some challenges. I'm really not the best at RegEx. Any time I tried to leverage the $matches collection I was only able to replace the first match. There's probably something I'm forgetting about that functionality. However, I was able to cook up the below:
[RegEx]::Matches($String, '(?<=EnumMessage\.)\w+') |
ForEach-Object{
$Replace = $String.Substring($_.Index, $_.Length).ToUpper()
$String = $String.Remove($_.Index, $_.Length)
$String = $String.Insert($_.Index, $Replace)
}
$String
Note: I used a RegEx lookbehind, but I'm not positive that had much to do with the outcome.
The .Net [RegEx] class returned objects with the location of the matches in the string so I used that to strategically remove then add the ucased strings. Which should return: EnumMessage.TEST(something) and EnumMessage.TEZT andz EnumMessage.ALREADY.

Powershell replace last two occurrences of a '/' in file path with '.'

I have a filepath, and I'm trying to remove the last two occurrences of the / character into . and also completely remove the '{}' via Powershell to then turn that into a variable.
So, turn this:
xxx-xxx-xx\xxxxxxx\x\{xxxx-xxxxx-xxxx}\xxxxx\xxxxx
Into this:
xxx-xxx-xx\xxxxxxx\x\xxxx-xxxxx-xxxx.xxxxx.xxxxx
I've tried to get this working with the replace cmdlet, but this seems to focus more on replacing all occurrences or the first/last occurrence, which isn't my issue. Any guidance would be appreciated!
Edit:
So, I have an excel file and i'm creating a powershell script that uses a for each loop over every row, which amounts to thousands of entries. For each of those entries, I want to create a secondary variable that will take the full path, and save that path minus the last two slashes. Here's the portion of the script that i'm working on:
Foreach($script in $roboSource)
{
$logFileName = "$($script.a).txt".Replace('(?<=^[^\]+-[^\]+)-','.')
}
$script.a will output thousands of entries in this format:
xxx-xxx-xx\xxxxxxx\x{xxxx-xxxxx-xxxx}\xxxxx\xxxxx
Which is expected.
I want $logFileName to output this:
xxx-xxx-xx\xxxxxxx\x\xxxx-xxxxx-xxxx.xxxxx.xxxxx
I'm just starting to understand regex, and I believe the capture group between the parenthesis should be catching at least one of the '\', but testing attempts show no changes after adding the replace+regex.
Please let me know if I can provide more info.
Thanks!
You can do this in two fairly simply -replace operations:
Remove { and }
Replace the last two \:
$str = 'xxx-xxx-xx\xxxxxxx\x\{xxxx-xxxxx-xxxx}\xxxxx\xxxxx'
$str -replace '[{}]' -replace '\\([^\\]*)\\([^\\]*)$','.$1.$2'
The second pattern matches:
\\ # 1 literal '\'
( # open first capture group
[^\\]* # 0 or more non-'\' characters
) # close first capture group
\\ # 1 literal '\'
( # open second capture group
[^\\]* # 0 or more non-'\' characters
) # close second capture group
$ # end of string
Which we replace with the first and second capture group values, but with . before, instead of \: .$1.$2
If you're using PowerShell Core version 6.1 or newer, you can also take advantage of right-to-left -split:
($str -replace '[{}]' -split '\\',-3) -join '.'
-split '\\',-3 has the same effect as -split '\\',3, but splitting from the right rather than the left.
A 2-step approach is simplest in this case:
# Input string.
$str = 'xxx-xxx-xx\xxxxxxx\x\{xxxx-xxxxx-xxxx}\xxxxx\xxxxx'
# Get everything before the "{"
$prefix = $str -replace '\{.+'
# Get everything starting with the "{", remove "{ and "}",
# and replace "\" with "."
$suffix = $str.Substring($prefix.Length) -replace '[{}]' -replace '\\', '.'
# Output the combined result (or assign to $logFileName)
$prefix + $suffix
If you wanted to do it with a single -replace operation (with nesting), things get more complicated:
Note: This solution requires PowerShell Core (v6.1+)
$str -replace '(.+)\{(.+)\}(.+)',
{ $_.Groups[1].Value + $_.Groups[2].Value + ($_.Groups[3].Value -replace '\\', '.') }
Also see the elegant PS-Core-only -split based solution with a negative index (to split only a fixed number of tokens off the end) in Mathias R. Jessen's helpful answer.
try this
$str='xxx-xxx-xx\xxxxxxx\x\{xxxx-xxxxx-xxxx}\xxxxx\xxxxx'
#remove bracket and split for get array
$Array=$str -replace '[{}]' -split '\\'
#take all element except 2 last elements, and concat after last elems
"{0}.{1}.{2}" -f ($Array[0..($Array.Length -3)] -join '\'), $Array[-2], $Array[-1]

PowerShell Dobule Escape string with text "$__VAR__"

One of my scripts can be stripped down to the following code.
function Replace
{
[CmdletBinding()]
Param
(
[Parameter(Mandatory, Position=0)]
[string]
$LiteralPath,
[Parameter(Mandatory, Position=1)]
[string]
$Expression,
[Parameter(Mandatory, Position=2)]
[string]
$Replacement
)
Get-Content $LiteralPath | ForEach-Object {$_ -replace $Expression, $Replacement } | Set-Content $LiteralPath + ".temp"
}
An example call to the script would be
Replace ".\MyFile.txt" "^#define abc.*" "#define abc 1"
I have run into a situation where the string I need to find and replace contains both dollar signs and underscores. The dollars signs must be escaped to prevent PowerShell from expanding the variable. One string contains a dollar sign followed by an underscore. This is causing an issue because PowerShell is not expanding the variable name but then is expanding the $_ piping variable. How can I prevent PowerShell from expanding both variable name and piping token.
This is an example cal to the function with a string I need to escape.
Replace ".\MyFile.txt" "^\#\`$__LIBRARY_DIR\\prj.gpj" "`$__LIBRARY_DIR\prj.gpj"
In this example the line of text which reads #$__LIBRARY_DIR\prj.gpj is getting changed to #$__LIBRARY_DIR\prj.gpj_LIBRARY_DIR\prj.gpj. I am looking for the text to be changed to $__LIBRARY_DIR\prj.gpj
Notice the $_ is expanded which I do not want it to expand. I have tried adding more escape characters but that only causes them to appear in the file. How can the string be escaped to prevent $_ from expanding?
In powershell, if you don't want variables to expand in your string, use 'single quotes' instead if "double quotes", that saves you the trouble of escaping the $ sign with backticks.
Now in your case you have the additional challenge, that the -replace operator will also want to expand expressions that start with the $ sign in the replacement string, regardless of the types of quotes that you use.
To tell -replace that you really want to see that $ in your replacement string, you need to write $$:
'#$__LIBRARY_DIR\prj.gpj' -replace '^#\$__LIBRARY_DIR\\prj.gpj','$$__LIBRARY_DIR\prj.gpj'
Note: As others have correctly pointed out in the comments, if your task is to strip expressions from a leading #, you can do that in a more simple way:
'#$__LIBRARY_DIR\prj.gpj' -replace '^#'
Or alternatively with the good old "trim":
'#$__LIBRARY_DIR\prj.gpj'.TrimStart('#')

How to strip illegal characters before trying to save filenames?

I was able to find how to use the GetInvalidFileNameChars() method in a PowerShell script. However, it seems to also filter out whitespace (which is what I DON'T want).
EDIT: Maybe I'm not asking this clearly enough. I want the below function to INCLUDE the spaces that already existing in filenames. Currently, the script filters out spaces.
Function Remove-InvalidFileNameChars {
param([Parameter(Mandatory=$true,
Position=0,
ValueFromPipeline=$true,
ValueFromPipelineByPropertyName=$true)]
[String]$Name
)
return [RegEx]::Replace($Name, "[{0}]" -f ([RegEx]::Escape([String][System.IO.Path]::GetInvalidFileNameChars())), '')}
Casting the character array to System.String actually seems to join the array elements with spaces, meaning that
[string][System.IO.Path]::GetInvalidFileNameChars()
does the same as
[System.IO.Path]::GetInvalidFileNameChars() -join ' '
when you actually want
[System.IO.Path]::GetInvalidFileNameChars() -join ''
As #mjolinor mentioned (+1), this is caused by the output field separator ($OFS).
Evidence:
PS C:\> [RegEx]::Escape([string][IO.Path]::GetInvalidFileNameChars())
"\ \ \|\ \ ☺\ ☻\ ♥\ ♦\ ♣\ ♠\ \\ \t\ \n\ ♂\ \f\ \r\ ♫\ ☼\ ►\ ◄\ ↕\ ‼\ ¶\ §\ ▬\ ↨\ ↑\ ↓\ →\ ←\ ∟\ ↔\ ▲\ ▼\ :\ \*\ \?\ \\\ /
PS C:\> [RegEx]::Escape(([IO.Path]::GetInvalidFileNameChars() -join ' '))
"\ \ \|\ \ ☺\ ☻\ ♥\ ♦\ ♣\ ♠\ \\ \t\ \n\ ♂\ \f\ \r\ ♫\ ☼\ ►\ ◄\ ↕\ ‼\ ¶\ §\ ▬\ ↨\ ↑\ ↓\ →\ ←\ ∟\ ↔\ ▲\ ▼\ :\ \*\ \?\ \\\ /
PS C:\> [RegEx]::Escape(([IO.Path]::GetInvalidFileNameChars() -join ''))
"\| ☺☻♥♦\t\n♂\f\r♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼:\*\?\\/
PS C:\> $OFS=''
PS C:\> [RegEx]::Escape([string][IO.Path]::GetInvalidFileNameChars())
"\| ☺☻♥♦\t\n♂\f\r♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼:\*\?\\/
Change your function to something like this:
Function Remove-InvalidFileNameChars {
param(
[Parameter(Mandatory=$true,
Position=0,
ValueFromPipeline=$true,
ValueFromPipelineByPropertyName=$true)]
[String]$Name
)
$invalidChars = [IO.Path]::GetInvalidFileNameChars() -join ''
$re = "[{0}]" -f [RegEx]::Escape($invalidChars)
return ($Name -replace $re)
}
and it should do what you want.
My current favourite way to accomplish this is:
$Path.Split([IO.Path]::GetInvalidFileNameChars()) -join '_'
This replaces all invalid characters with _ and is very human readable, compared to alternatives such as:
$Path -replace "[$([RegEx]::Escape([string][IO.Path]::GetInvalidFileNameChars()))]+","_"
I suspect it has to do with non-display characters being coerced to [string] for the regex operation (and ending up expressed as spaces).
See if this doesn't work better:
([char[]]$name | where { [IO.Path]::GetinvalidFileNameChars() -notcontains $_ }) -join ''
That will do a straight char comparison, and seems to be more reliable (embedded spaces are not removed).
$name = 'abc*\ def.txt'
([char[]]$name | where { [IO.Path]::GetinvalidFileNameChars() -notcontains $_ }) -join ''
abc def.txt
Edit - I believe #Ansgar is correct about the space being caused by casting the character array to string. The space is being introduced by $OFS.
I wanted spaces to replace all the illegal characters so space is replaced with space
$Filename = $ADUser.SamAccountName
[IO.Path]::GetinvalidFileNameChars() | ForEach-Object {$Filename = $Filename.Replace($_," ")}
$Filename = "folder\" + $Filename.trim() + ".txt"
Please try this one-liner with the same underlying function.
to match
'?Some "" File Name <:.txt' -match ("[{0}]"-f (([System.IO.Path]::GetInvalidFileNameChars()|%{[regex]::Escape($_)}) -join '|'))
to replace
'?Some "" File Name <:.txt' -replace ("[{0}]"-f (([System.IO.Path]::GetInvalidFileNameChars()|%{[regex]::Escape($_)}) -join '|')),'_'
[System.IO.Path]::GetInvalidFileNameChars() returns an array of invalid chars. If it is returning the space character for you (which it does not do for me), you could always iterate over the array and remove it.
> $chars = #()
> foreach ($c in [System.IO.Path]::GetInvalidFileNameChars())
{
if ($c -ne ' ')
{
$chars += $c
}
}
Then you can use $chars as you would have used the output from GetInvalidFileNameChars().
Very slightly different, combined, flexible approach. I was finding that GetInvalidFileNameChars() was not getting all the illegal chars for my needs.
$arrInvalidChars = '[]/|\+={}-$%^&*()'.ToCharArray()
$cleanName = 'a[]|\+={9}-$%^&*()\b'
$arrInvalidChars | % { $cleanName = $cleanName.replace($_,'_')}
Returns $cleanName = 'a_______9__________b'