Powershell - Replacing substrings with wildcards

Powershell - Replacing substrings with wildcards - powershell

I am writing a function in powershell, and part of it needs to replace occurrences of substrings with a wildcard. Strings will look something like this:
"something-#{reference}-somethingElse-#{anotherReference}-01"
I want it to end up looking like this:
"something-*-somethingElse-*-01"
The problem I have here is that I don't know what "#{something}" will be, just that there will be multiple substrings enclosed inside a hashtag followed by curly braces. I've tried the Replace method like so:
$newString = $originalString.Replace('#{*}', '*')
I was hoping that would replace everything from the hashtag to the ending curly brace, but it doesn't work like that. I'm trying to avoid cumbersome code that is based on finding the indices of '#' and '}' and then replacing, and hoping there is a simpler and more elegant solution.

Your replace has at least one problem, possibly two;
the method $string.Replace() is from the .Net framework string class - it's PowerShell, but it's exactly what you'd get in C#, minimal PowerShell script-y convenience added on top - and it's for literal text replacements - it doesn't support wildcards or regular expressions.
The 'wildcard' support in PowerShell is quite limited, to the -like operator only, as far as I know. That can't do text replacing, and it's a convenience for people who don't know regular expression; behind the scenes it converts to a regular expression anyway. So the dream of a a*b replace won't work either.
As #PetSerAl comments, regular expressions and the PowerShell -replace operator are the PowerShell way to do every string pattern replace quickly and without .indexOf().
Their pattern #{[^}]*} expands to:
#{} on the outside, as literal characters
[^}] as a character class saying "not a } character, but anything else"
[*}]* - as many not }'s as there are.
So, match hash and open brace, everything that isn't the closing brace brace (to avoid overrunning past the closing brace), then the closing brace. Replace it all with literal *.
Implicitly, do that search/replace as many times as possible in the input string.

Related

Powershell escaping quotation marks in a variable that is used in another variable

I've found plenty of post explaining how to literally escape both single and double quotation marks using either """" for one double quotation mark, or '''' for a single quotation mark (or just doing `"). I find myself in a situation where I need to search through a list of names that is input in a different query:
Foreach($Username in $AllMyUsers.Username){
$Filter = "User = '$Username'"
# do some more code here using $Filter
}
The problem occurs when I reach a username like O'toole or O'brian which contains quotation marks. If the string is literal I could escape it with
O`'toole
O''brian
etc.
But, since it's in a loop I need to escape the quotation mark for each user.
I tried to use [regex]::Escape() but that doesn't escape quotation marks.
I could probably do something like $Username.Replace("'","''") but it feels like there should be a more generic solution than having to manually escape the quotation marks. In other circumstances I might need to escape both single and double, and just tacking on .Replace like so $VariableName.Replace("'","''").Replace('"','""') doesn't feel like it's the most efficient way to code.
Any help is appreciated!
EDIT: This feels exactly like a "how can I avoid SQL injection?" question but for handling strings in Powershell. I guess I'm looking for something like mysqli_real_escape_string but for Powershell.

I could probably do something like $Username.Replace("'","''") but it feels like there should be a more generic solution than having to manually escape the quotation marks
As implied by Mathias R. Jessen's comment, there is no generic solution, given that you're dealing with an embedded '...' string, and the escaping requirements entirely depend on the ultimate target API or utility - which is unlikely to be PowerShell itself (where ' inside a '...' string must be doubled, i.e. represented as '').
In the case at hand, where you're passing the filter string to a System.Data.DataTable's .DataView's .RowFilter property, '' is required as well.
The conceptually cleanest way to handle the required escaping is to use -f, the format operator, combined with a separate string-replacement operation:
$Filter = "User = '{0}'" -f ($UserName -replace "'", "''")
Note how PowerShell's -replace operator - rather than the .NET [string] type's .Replace() method - is used to perform the replacement.
Aside from being more PowerShell-idiomatic (with respect to: syntax, being case-insensitive by default, accepting arrays as input, converting to strings on demand), -replace is regex-based, which also makes performing multiple replacements easier.
To demonstrate with your hypothetical .Replace("'","''").Replace('"','""') example:
#'
I'm "fine".
'# -replace '[''"]', '$0$0'
Output:
I''m ""fine"".

Oddities in fail2ban regex

This appears to be a bug in fail2ban, with different behaviour between the fail2ban-regex tool and a failregex filter
I am attempting to develop a new regex rule for fail2ban, to match:
\"%20and%20\"x\"%3D\"x
When using fail2ban-regex, this appears to produce the desired result:
^<HOST>.*GET.*\\"%20and%20\\"x\\"%3D\\"x.* 200.*$
As does this:
^<HOST>.*GET.*\\\"%20and%20\\\"x\\\"%3D\\\"x.* 200.*$
However, when I put either of these into a filter, I get the following error:
Failed during configuration: '%' must be followed by '%' or '(', found:…
To have this work in a filter you have to double-up the ‘%’, ie ‘%%’:
^<HOST>.*GET.*\\\"%%20and%%20\\\"x\\\"%%3D\\\"x.* 200.*$
While this gets the required hits running as a filter, it gets none running through fail2ban-regex.
I tried the \\\\ as Andre suggested below, but this gets no results in fail2ban-regex.
So, as this appears to be differential behaviour, I am going to file it as a bug.

According to Python's own site a singe backslash "\" has to be written as "\\\\" and there's no mention of %.
Regular expressions use the backslash character ('') to indicate
special forms or to allow special characters to be used without
invoking their special meaning. This collides with Python’s usage of
the same character for the same purpose in string literals; for
example, to match a literal backslash, one might have to write '\\'
as the pattern string, because the regular expression must be \, and
each backslash must be expressed as \ inside a regular Python string
literal
I would just go with:
failregex = (?i)^<HOST> -.*"(GET|POST|HEAD|PUT).*20and.*3d.*$
the .* wil match anything inbetween anyways and (?i) makes the entire regex case-insensitive

Not able to understand a command in perl

I need help to understand what below command is doing exactly
$abc{hier} =~ s#/tools.*/dfII/?.*##g;
and $abc{hier} contains a path "/home/test1/test2/test3"
Can someone please let me know what the above command is doing exactly. Thanks

s/PATTERN/REPLACEMENT/ is Perl's substitution operator. It searches a string for text that matches the regex PATTERN and replaces it with REPLACEMENT.
By default, the substitution operator works on $_. To tell it to work on a different variable, you use the binding operator - =~.
The default delimiter used by the substitution operator is a slash (/) but you can change that to any other character. This is useful if your PATTERN or your REPLACEMENT contains a slash. In this case, the programmer has used # as the delimiter.
To recap:
$abc{hier} =~ s#PATTERN#REPLACEMENT#;
means "look for text in $abc{hier} that matches PATTERN and replace it with REPLACEMENT.
The substitution operator also has various options that change its behaviour. They are added by putting letters after the final delimiter. In this case we have a g. That means "make the substitution global" - or match and change all occurrences of PATTERN.
In your case, the REPLACEMENT string is empty (we have two # characters next to each other). So we're replacing the PATTERN with nothing - effectively deleting whatever matches PATTERN.
So now we have:
$abc{hier} =~ s#PATTERN*##g;
And we know it means, "in the variable $abc{hier}, look for any string that matches PATTERN and replace it with nothing".
The last thing to look at is the PATTERN (or regular expression - "regex"). You can get the full definition of regexes in perldoc perlre. But to explain what we're using here:
/tools : is the fixed string "/tools"
.* : is zero or more of any character
/dfII : is the fixed string "/dfII"
/? : is an optional slash character
.* : is (again) zero or more of any character
So, basically, we're removing bits of a file path from a value that's stored in a hash.

This =~ means "Do a regex operation on that variable."
(Actually, as ikegami correctly reminds me, it is not necessarily only regex operations, because it could also be a transliteration.)
The operation in question is s#something#else#, which means replace the "something" with something "else".
The g at the end means "Do it for all occurences of something."
Since the "else" is empty, the replacement has the effect of deleting.
The "something" is a definition according to regex syntax, roughly it means "Starting with '/tools' and later containing '/dfII', followed pretty much by anything until the end."
Note, the regex mentions at the end /?.*. In detail, this would mean "A slash (/) , or maybe not (?), and then absolutely anything (.) any number of times including 0 times (*). Strictly speaking it is not necessary to define "slash or not", if it is followed by "anything any often", because "anything" includes as slash, and anyoften would include 0 or one time; whether it is followed by more "anything" or not. I.e. the /? could be omitted, without changing the behaviour.
(Thanks ikeagami for confirming.)

$abc{hier} =~ s#/tools.*/dfII/?.*##g;
The above commands use regular expression to strip/remove trailing /tools.*/dfII and
/tools.*/dfII/.* from value of hier member of %abc hash.
It is pretty basic perl except non standard regular expression limiters (# instead of standard /). It allows to avoid escaping / inside the regular expression (s/\/tools.*\/dfII\/?.*//g).
My personal preferred style-guide would make it s{/tools.*/dfII/?.*}{}g .

How PowerShell does decide which mode when parsing?

I can't quite understand how PowerShell parses commands and need your help.
I read the following explanation by Microsoft's about_parsing documentation:
When processing a command, the PowerShell parser operates in expression mode or in argument mode:
In expression mode, character string values must be contained in quotation marks. Numbers not enclosed in quotation marks are treated as numerical values (rather than as a series of characters).
In argument mode, each value is treated as an expandable string unless it begins with one of the following special characters: dollar sign ($), at sign (#), single quotation mark ('), double quotation mark ("), or an opening parenthesis (().
If preceded by one of these characters, the value is treated as a value expression.
I can understand when parsing a command, PowerShell uses either expression mode or argument mode, but I can't quite understand the following examples.
$a = 2+2
Write-Output $a #4(int), expression mode
Write-Output $a/H #4/H(str), argument mode
I wonder PowerShell expands variable first and then decide which mode when parsing, but is it right?
If so, there's another question about data type.
It seems reasonable for me the former command produces integral, but the latter one doesn't. Why can integer 4 be put next to string /H?
I tried this example and it worked. It seems variables turn into string whatever data type they are when expanded. Is it right?
$b = 100
Add-Content C:\Users\Owner\Desktop\$b\test.txt 'test'
I appreciate for your help.
Edited to clarify the point after got the comment
I've got the comment that the both Write-Output examples are argument mode, so can the examples be interpreted like this?
Write-Output "$a"
Write-Output "$a/H"
I'm terribly sorry for too ambiguous question, but I want to know:
In argument mode, double quotations are omitted?
The Write-Output examples are quoted from microsoft's document I linked and it says the first example produces integral, but is it wrong?

Funky 'x' usage in perl

My usual 'x' usage was :
print("#" x 78, "\n");
Which concatenates 78 times the string "#". But recently I came across this code:
while (<>) { print if m{^a}x }
Which prints every line of input starting with an 'a'. I understand the regexp matching part (m{^a}), but I really don't see what that 'x' is doing here.
Any explanation would be appreciated.

It's a modifier for the regex. The x modifier tells perl to ignore whitespace and comments inside the regex.
In your example code it does not make a difference because there are no whitespace or comments in the regex.

The "x" in your first case, is a repetition operator, which takes the string as the left argument and the number of times to repeat as the right argument. Perl6 can replicate lists using the "xx" repetition operator.
Your second example uses the regular expression m{^a}x. While you may use many different types of delimiters, neophytes may like to use the familiar notation, which uses a forward slash: m/^a/x
The "x" in a regex is called a modifier or a flag and is but one of many optional flags that may be used. It is used to ignore whitespace in the regex pattern, but it also allows the use of normal comments inside. Because regex patterns can get really long and confusing, using whitespace and comments are very helpful.
Your example is very short (all it says is if the first letter of the line starts with "a"), so you probably wouldn't need whitespace or comments, but you could if you wanted to.
Example:
m/^a # first letter is an 'a'
# <-- you can put more regex on this line because whitespace is ignored
# <-- and more here if you want
/x

In this use case 'x' is a regex modifier which "Extends your pattern's legibility by permitting whitespace and comments." according to the perl documentation. However it seems redundant here

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse