Powershell regex -replace matches more often than it should

Powershell regex -replace matches more often than it should - powershell

I have the following Regular Expression: ([a-z])([A-Z])
When I plug it into RegEx 101 it seems to work perfectly: https://regex101.com/r/vhifNL/1
But when I plug it into Powershell to have the matches replaced with dashes, it goes crazy:
"JavaScript" -replace '([a-z])([A-Z])', '$1-$2'
I expect to get Java-Script. But instead I get:
J-av-aS-cr-ip-t
Why is it not matching the same way that RegEx101 has it match?
NOTE: This question is not tagged with RegEx on purpose. I would take it as a kindness if no-one added it. The RegEx folks have a different set of rules they run by for questions and will likely close my question.

PowerShell's -replace operator, like all PowerShell operators that can operate on strings (notably -match, -eq, -like, -contains and their negated counterparts), and like PowerShell in general, is case-insensitive by default.
However, all such operators have case-sensitive variants, selected by simply prepending c to the operator name, namely -creplace in the case at hand:
PS> "JavaScript" -creplace '([a-z])([A-Z])', '$1-$2'
Java-Script
As for what you tried:
Due to -replace being case-insensitive (which you can optionally signal explicitly with the
-ireplace alias), your regex was essentially equivalent to:
([a-zA-Z][a-zA-Z])
and therefore matched any two consecutive (ASCII-range) letters, and not the desired transition from a lowercase to an uppercase letter.

Related

Powershell match similar entries in an array

I've written myself a script to check for vm-folders in vmware vcenter that dont match the corresponding vmname.
There are a few automatically deployed VMs which i need to exclude from this check. Those VMs are always similarly named, but with an incremented number at the end. I've declared an array $Vmstoginrore containing strings of them and i'm trying to match my $VmName with this array but it does not work. Ive also tried it with like but i cannot seem to get this to work.
$Vmstoignore=#( "Guest Introspection","Trend Micro Deep Security")
$VmName = "Guest Introspection (4)"
if ($Vmstoignore-match $VmName ){
Write-Output "does match"
}
else {
Write-Output "doesn't match"
}

As of v7.2.x, PowerShell offers no comparison operators that accept an array of comparison values (only the input operand - the LHS - is allowed to be an array).
However, sine the -match operator is regex-based, you can use a single regex with an alternation (|) to match multiple patterns.
Note that the regex pattern to match against must be the RHS (right-hand side) operand of -match (e.g. 'foo' -match '^f' - in your question, the operands are mistakenly reversed).
The following code shows how to construct the regex programmatically from the given, literal array elements (VM name prefixes):
# VM name prefixes to match.
$Vmstoignore = #("Guest Introspection", "Trend Micro Deep Security")
# Construct a regex with alternation (|) from the array, requiring
# the elements to match at the *start* (^) of the input string.
# The resulting regex is:
# ^Guest\ Introspection|^Trend\ Micro\ Deep\ Security
$regex = $Vmstoignore.ForEach({ '^' + [regex]::Escape($_) }) -join '|'
# Sample input name.
$VmName = "Guest Introspection (4)"
# Now match the input name against the regex:
# -> $true
$VmName -match $regex
Note:
You may alternatively construct the regex directly as a string, in which case you need to manually \-escape any regex metacharacters, such as . (no escaping is required with your sample array):
$regex = '^Guest Introspection|^Trend Micro Deep Security'
Note that [regex]::Escape() escapes (spaces) as \ , even though spaces aren't metacharacters. However, if the x (IgnorePatternWhiteSpace) regex option is in effect (e.g. by placing (?x) at the start of the regex), spaces that are a meaningful part of the pattern do require this escaping. In the absence of this option (it is off by default), escaping spaces is not necessary.
For a detailed explanation of the regex and the ability to interact with it, see this regex101.com page.

Check if a string exists in an array even as a substring in PowerShell

I'm trying to work out if a string exists in an array, even if it's a substring of a value in the array.
I've tried a few methods and just can't get it to work, not sure where I'm going wrong.
I have the below code, you can see that $val2 exists within $val1, but I always get a FALSE when I run it.
$val1 = "folder1\folder2\folder3"
$val2 = "folder1\folder2"
$val3 = "folder9"
$val_array = #()
$val_array += $val1
$val_array += $val3
$null -ne ($val_array | ? { $val2 -match $_ }) # Returns $true
I also tried:
foreach ($item in $val_array) {
if ($item -match $val2) {
Write-Host "yes"
}
}

The -Match operator does a regular expression comparison. Where the backslash character (\) has a special meaning (it escapes the following character).
Instead you might use the -Like operator:
$val_array -Like "*$val2*"
Yields:
folder1\folder2\folder3

iRon's helpful answer offers the best solution to your problem, using wildcard matching via the -like operator.
Note:
The need to escape select characters in a search pattern in order for the pattern to be taken verbatim in principle also applies to the wildcard-based -like operator, not just to the regex-based -match operator, but since wildcard expressions have far fewer metacharacters than regexes - namely just *, ?, and [ - the need for such escaping doesn't often arise in practice; whereas regexes require \ as the escape characters, wildcards use `, and programmatic escaping can be achieved with [WildcardPattern]::Escape()
Unfortunately, as of PowerShell 7.2, there is no dedicated operator for verbatim substring matching:
A workaround for this limitation is to call the [string] .NET type's .Contains() method (on a single input string only), however, this performs case-sensitive matching, whereas PowerShell operators are case-insensitive by default, but offer case-sensitive variants simply by prefixing the operator name with c (e.g., -clike, -cmatch).
In Windows PowerShell, .Contains() is invariably case-sensitive, but in PowerShell (Core) 7+ an additional overload is available that offers case-insensitive matching:
'Foo'.Contains('fo') # -> $false, due to case difference
# PowerShell (Core) 7+ *only*:
'Foo'.Contains('fo', 'InvariantCultureIgnoreCase') # -> $true
Caveat: Despite the name similarity, PowerShell's -contains operator does not perform substring matching; instead, it tests whether a collection contains a given element (in full).
As for what you tried:
Your primary problem is that you've accidentally swapped the -match operator's operands: the search pattern - which is invariably interpreted as a regex (regular expression) - must be on the RHS.
As iRon points out, in order for your search pattern to be taken verbatim (literally), you need to escape regex metacharacters with \, and the robust, programmatic way to do this is with [regex]::Escape().
Therefore, the immediate fix would have been (? is a built-in alias of the Where-Object cmdlet):
# OK, but SLOW.
$val_array | ? { $_ -match [regex]::Escape($val2) }
However, this solution is inefficient (it involves the pipeline and a cmdlet).
Fortunately, PowerShell's comparison operators can be applied to arrays (collections) directly, in which case they act as filters, i.e. they return the sub-array of matching elements - see the docs.
iRon's answer uses this technique with -like, but it equally works with -match, so that your expression can be simplified to the following, much more efficient form:
# MUCH FASTER.
$val_array -match [regex]::Escape($val2)

Try the string method Contains:
$null -ne ($val_array | ? { $_.Contains($val2) })

If -match is case-insensitive, why do we need -imatch?

It seems redundant to provide -match and -imatch if -match is already case-insensitive. Is there any difference between them?

To elaborate on Doug Maurer's comment:
The i-prefixed variants of PowerShell operators that (also) operate on strings are never necessary. In fact, they are simply aliases of their non-prefixed forms, so that -imatch is the same as -match, for instance, and - with string input - always acts case-insensitively, as PowerShell generally does.
These variants exist for symmetry with the c-prefixed operator variants, which explicitly request case-sensitive operation (with string input).
In other words: you can use the i-prefixed variants to make it explicit that a given operation is case-insensitive.
However, to someone familiar with PowerShell's fundamentally case-insensitive nature, that isn't necessary - and that's probably why you rarely see the i-prefixed variants in practice.

Literal Find and replace exact match. Ignore regex [duplicate]

I'm writing a powershell program to replace strings using
-replace "$in", "$out"
It doesn't work for strings containing a backslash, how can I do to escape it?

The -replace operator uses regular expressions, which treat backslash as a special character. You can use double backslash to get a literal single backslash.
In your case, since you're using variables, I assume that you won't know the contents at design time. In this case, you should run it through [RegEx]::Escape():
-replace [RegEx]::Escape($in), "$out"
That method escapes any characters that are special to regex with whatever is needed to make them a literal match (other special characters include .,$,^,(),[], and more.

You'll need to either escape the backslash in the pattern with another backslash or use the .Replace() method instead of the -replace operator (but be advised they may perform differently):
PS C:\> 'asdf' -replace 'as', 'b'
bdf
PS C:\> 'a\sdf' -replace 'a\s', 'b'
a\sdf
PS C:\> 'a\sdf' -replace 'a\\s', 'b'
bdf
PS C:\> 'a\sdf' -replace ('a\s' -replace '\\','\\'), 'b'
bdf
Note that only the search pattern string needs to be escaped. The code -replace '\\','\\' says, "replace the escaped pattern string '\\', which is a single backslash, with the unescaped literal string '\\' which is two backslashes."
So, you should be able to use:
-replace ("$in" -replace '\\','\\'), "$out"
[Note: briantist's solution is better.]
However, if your pattern has consecutive backslashes, you'll need to test it.
Or, you can use the .Replace() string method, but as I said above, it may not perfectly match the behavior of the -replace operator:
PS C:\> 'a\sdf'.replace('a\\s', 'b')
a\sdf
PS C:\> 'a\sdf'.replace( 'a\s', 'b')
bdf

How do I strip part of a file name?

Suppose I have a file database_partial.xml.
I am trying to strip the file from "_partial" as well as extension (xml) and then capitalize the name so that it becomes DATABASE.
Param($xmlfile)
$xml = Get-ChildItem "C:\Files" -Filter "$xmlfile"
$db = [IO.Path]::GetFileNameWithoutExtension($xml).ToUpper()
That returns DATABASE_PARTIAL, but I don't know how to strip the _PARTIAL part.

You don't need GetFileNameWithoutExtension() for removing the extension. The FileInfo objects returned by Get-ChildItem have a property BaseName that gives you the filename without extension. Uppercase that, then remove the "_PARTIAL" suffix. I would also recommend processing the output of Get-ChildItem in a loop, just in case it doesn't return exactly one result.
Get-ChildItem "C:\Files" -Filter "$xmlfile" | ForEach-Object {
$_.BaseName.ToUpper().Replace('_PARTIAL', '')
}
If the substring after the underscore can vary, use a regular expression replacement instead of a string replacement, e.g. like this:
Get-ChildItem "C:\Files" -Filter "$xmlfile" | ForEach-Object {
$_.BaseName.ToUpper() -replace '_[^_]*$'
}

Ansgar Wiechers's helpful answer provides an effective solution.
To focus on the more general question of how to strip (remove) part of a file name (string):
Use PowerShell's -replace operator, whose syntax is:<stringOrStrings> -replace <regex>, <replacement>:
<regex> is a regex (regular expression) that matches the part to replace,
<replacement> is replacement operand (the string to replace what the regex matched).
In order to effectively remove what the regex matched, specify '' (the empty string) or simply omit the operand altogether - in either case, the matched part is effectively removed from the input string.
For more information about -replace, see this answer.
Applied to your case:
$db = 'DATABASE_PARTIAL' # sample input value
PS> $db -replace '_PARTIAL$', '' # removes suffix '_PARTIAL' from the end (^)
DATABASE
PS> $db -replace '_PARTIAL$' # ditto, with '' implied as the replacement string.
DATABASE
Note:
-replace is case-insensitive by default, as are all PowerShell operators. To explicitly perform case-sensitive matching, use the -creplace variant.
By contrast, the [string] type's .Replace() method (e.g., $db.Replace('_PARTIAL', ''):
matches by string literals only, and therefore offers less flexibility; in this case, you couldn't stipulate that _PARTIAL should only be matched at the end of the string, for instance.
is invariably case-sensitive in the .NET Framework (though .NET Core offers a case-insensitive overload).
Building on Ansgar's answer, your script can therefore be streamlined as follows:
Param($xmlfile)
$db = ((Get-ChildItem C:\Files -Filter $xmlfile).BaseName -replace '_PARTIAL$').ToUpper()
Note that in PSv3+ this works even if $xmlfile should match multiple files, due to member-access enumeration and the ability of -replace to accept an array of strings as input, the desired substring removal would be performed on the base names of all files, as would the subsequent uppercasing - $db would then receive an array of stripped base names.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Powershell regex -replace matches more often than it should - powershell

Related

Powershell match similar entries in an array

Check if a string exists in an array even as a substring in PowerShell

If -match is case-insensitive, why do we need -imatch?

Literal Find and replace exact match. Ignore regex [duplicate]

How do I strip part of a file name?

Categories

Resources