wildcard inside string behaves strangely - powershell

I've just started to learn PS. Can somebody explain me below example (the reason for it if possible):
Get-Command s*rvice
gives hits like Set-Service, Start-Service, etc. but not commands like New-Service, Restart-Service, etc.

You're imagining that the * represents a single character, but it can match any number of characters. s[e]rvice and s[eeeeee]rvice and s[tart-se]rvice
You're reading that s*rvice will match exactly the text you type, and vary only where the wildcard is. But you're expecting it to match New-Service as if it has wildcards at the start and end like *s*rvice* - but it doesn't, it will only match the pattern you typed; the start must start with s and the end must finish with e.

Related

Regsubbing simple matches

I'm looking for a regsub example that does the following:
123tcl456TCL789 => 123!tcl!456!TCL!789
This is an Tcl example => This is an !Tcl! example
Yes, I could use string first to find a position and mash things but I saw in past a regsub command that does what I want but I can't recall. What would be the regsub command that allows that? I would guess regsub -all -nocase is a start.
I am bad at regsub and regexps. I wonder if there is a site or tool/script that we can supply a string, the final result and then we get the regsub form.
You're looking at the right tool, but there are various options, depending on exactly what the conditions are when faced with other text. Here's one that wraps each occurrence of "Tcl" (any capitalisation) with exclamation marks:
set inputString "123tcl456TCL789"
set replaced [regsub -all -nocase {tcl} $inputString {!&!}]
puts $replaced
That's using a very simple regular expression with the -nocase option, and the replacement means "put ! on either side of the substring matched".
Another (more generally applicable... perhaps) might be to put ! after any letter or number sequence that is followed by a number or letter.
set replaced [regsub -all {[A-Za-z]+(?=[0-9])|[0-9]+(?=[A-Za-z])} $inputString {&!}]
Note that doing things correctly typically requires understanding the real input data fairly well. For example, whether the numbers include floating point numbers in scientific notation, or whether the substrings to delimit are of fixed length.

match string pattern by certain characters but exclude combinations of those characters

I have the following sample string:
'-Dparam="x" -f hello-world.txt bye1.txt foo_bar.txt -Dparam2="y"'
I am trying to use RegEx (PowerShell, .NET flavor) to extract the filenames hello-world.txt, bye1.txt, and foo_bar.txt.
The real use case could have any number of -D parameters, and the -f <filenames> argument could appear in any position between these other parameters. I can't easily use something like split to extract it as the delimiter positioning could change, so I thought RegEx might be a good proposition here.
My attempt is something like this in PowerShell (can be opened on any Windows system and copy pasted into it):
'-Dparam="x" -f hello-world.txt bye1.txt foo_bar.txt -Dparam2="y"' -replace '^.* -f ([a-zA-Z0-9_.\s-]+).*$','$1'
Desired output:
hello-world.txt bye1.txt foo_bar.txt
My problem is that I either only take hello-world.txt, or I get hello-world.txt all the way to the end of the string or next = symbol (as in the example above).
I am having trouble expressing that \s is allowed, since I need to capture multiple space-delimited filenames, but that the combination of \s-[a-zA-Z] is not allowed, as that indicates the start of the next argument.

What am I allowed to name a function in Powershell?

PS > function ]{1}
PS > ]
1
PS >
PS
Why does this work?
What else can I name a function? All I've found so far that works is * and ].
You can name it almost anything. You can even include newlines and emoji* in the name.
function Weird`nFunctionの名前😀 { Write-Host hey }
$c = gcm Weird*
$c.Name
& $c
Escaping helps with lots of things like that:
function `{ { Write-Host cool }
`{
function `0 { Write-Host null }
gci function:\?
I'll add that this is true for variables too, and there's a syntax that removes the need to do most escaping in the variable name: ${varname} (as opposed to $varname).
With that, you could easily do:
${My variable has a first name,
it's
V
A
something
R,
whatever I dunno
🤷} = Get-Process
You'll note that if you then start typing like $MyTAB it will tab complete in a usable way.
To (somewhat) answer why this should work, consider that the variable names themselves are just stored in .Net strings. With that in mind, why should there be a limit on the name?
There will be limits on how some of these names can be used in certain contexts, because the parser will not understand what to do with it if the names don't have certain characters escaped, but literal parsing of PowerShell scripts are not the only way to use functions or variables or other language constructs, as I've shown some examples of.
Being less limiting also means being able to support other languages and cultures by having wide support for character sets.
To this end, here's one more thing that might surprise you: there are many different characters to represent the same or similar things that we take for granted in code, like quotation marks for example.
Some (human) languages or cultures just don't use the same quote characters we do in English, don't even have them on the keyboard. How annoying would it be to type code if you have to keep switching your keyboard layout or use ALT codes to quote strings?
So what I'm getting at here is that PowerShell actually does support many quote characters, for instance, what do you think this might do:
'Hello’
Pretty obvious it's not the "right" set of quotes on the right side. But surprisingly, this works just fine, even though they aren't the same character.
This does have important implications if you're ever generating code from user input and want to avoid sneaky injection attacks.
Imaging you did something like this:
Invoke-Expression "echo '$($userMsg -replace "'","''")'"
Looks like you took care of business, but now imagine if $userMsg contained this:
Hi’; gci c: -recurse|ri -force -whatif;'
For what it's worth, the CodeGeneration class is aware of this stuff ;)
Invoke-Expression "echo '$([System.Management.Automation.Language.CodeGeneration]::EscapeSingleQuotedStringContent($userMsg))'"
* PowerShell Console doesn't have good support for Unicode, even though the language does. Use ISE to better see the characters.

Wildcard searching between words with CRC mode in Sphinx

I use sphinx with CRC mode and min_infix_length = 1 and I want to use wildcard searching between character of a keyword. Assume I have some data like these in my index files:
name
-------
mickel
mick
mickol
mickil
micknil
nickol
nickal
and when I search for all record that their's name start with 'mick' and end with 'l':
select * from all where match ('mick*l')
I expect the results should be like this:
name
-------
mickel
mickol
mickil
micknil
but nothing returned. How can I do that?
I know that I can do this in dict=keywords mode but I should use crc mode for some reasons.
I also used '^' and '$' operators and didn't work.
You can't use 'middle' wildcards with CRC. One of the reaons for dict=keywords, the wildcards it can support are much more flexible.
With CRC, it 'precomputes' all the wildcard combinations, and injects them as seperate keywords in index, eg for
eg mickel as a document word, and with min_prefix_len=1, indexer willl create the words:
mickel
mickel*
micke*
mick*
mic*
mi*
m*
... as words in index, so all the combinations can match. If using min_infix_len, it also has to do all the combinations at the start as well (so (word_length)^2 + 1 combinations)
... if it had to precompute all the combinations for wildcards in the middle, would be a lot more again. Particularly if then allows all for middle AND start/end combinations as well)
Although having said that, you can rewrite
select * from all where match ('mick*l')
as
select * from all where match ('mick* *l')
because with min_infix_len, the start and end will be indexed as sperate words. Jus need to insist that both match. (although can't think how to make them bot match the same word!)

Partial String Replacement using PowerShell

Problem
I am working on a script that has a user provide a specific IP address and I want to mask this IP in some fashion so that it isn't stored in the logs. My problem is, that I can easily do this when I know what the first three values of the IP typically are; however, I want to avoid storing/hard coding those values into the code to if at all possible. I also want to be able to replace the values even if the first three are unknown to me.
Examples:
10.11.12.50 would display as XX.XX.XX.50
10.12.11.23 would also display as XX.XX.XX.23
I have looked up partial string replacements, but none of the questions or problems that I found came close to doing this. I have tried doing things like:
# This ended up replacing all of the numbers
$tempString = $str -replace '[0-9]', 'X'
I know that I am partway there, but I aiming to only replace only the first 3 sets of digits so, basically every digit that is before a '.', but I haven't been able to achieve this.
Question
Is what I'm trying to do possible to achieve with PowerShell? Is there a best practice way of achieving this?
Here's an example of how you can accomplish this:
Get-Content 'File.txt' |
ForEach-Object { $_ = $_ -replace '\d{1,3}\.\d{1,3}\.\d{1,3}','xx.xx.xx' }
This example matches a digit 1-3 times, a literal period, and continues that pattern so it'll capture anything from 0-999.0-999.0-999 and replace with xx.xx.xx
TheIncorrigible1's helpful answer is an exact way of solving the problem (replacement only happens if 3 consecutive .-separated groups of 1-3 digits are matched.)
A looser, but shorter solution that replaces everything but the last .-prefixed digit group:
PS> '10.11.12.50' -replace '.+(?=\.\d+$)', 'XX.XX.XX'
XX.XX.XX.50
(?=\.\d+$) is a (positive) lookahead assertion ((?=...)) that matches the enclosed subexpression (a literal . followed by 1 or more digits (\d) at the end of the string ($)), but doesn't capture it as part of the overall match.
The net effect is that only what .+ captured - everything before the lookahead assertion's match - is replaced with 'XX.XX.XX'.
Applied to the above example input string, 10.11.12.50:
(?=\.\d+$) matches the .-prefixed digit group at the end, .50.
.+ matches everything before .50, which is 10.11.12.
Since the (?=...) part isn't captured, it is therefore not included in what is replaced, so it is only substring 10.11.12 that is replaced, namely with XX.XX.XX, yielding XX.XX.XX.50 as a result.