Powershell Replace String on regex match - powershell

Need to replace strings after pattern matching. Using powershell v4.
Log line is -
"08:02:37.961" level="DEBUG" "Outbound message: [32056][Sent: HTTP]" threadId="40744"
Need to remove level and threadId completely. Expected line is -
"08:02:37.961" "Outbound message: [32056][Sent: HTTP]"
Have already tried following but did not work -
$line.Replace('level="\w+"','')
AND
$line.Replace('threadId="\d+"','')
Help needed with correct replace command. Thanks.

Try this regex:
$line = "08:02:37.961" level="DEBUG" "Outbound message: [32056][Sent: HTTP]" threadId="40744"
$line -replace '(\s*(level|threadId)="[^"]+")'
Result:
"08:02:37.961" "Outbound message: [32056][Sent: HTTP]"
Regex details:
( # Match the regular expression below and capture its match into backreference number 1
\s # Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 2
# Match either the regular expression below (attempting the next alternative only if this one fails)
level # Match the characters “level” literally
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
threadId # Match the characters “threadId” literally
)
=" # Match the characters “="” literally
[^"] # Match any character that is NOT a “"”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
" # Match the character “"” literally
)

.replace() doesn't use regex. https://learn.microsoft.com/en-us/dotnet/api/system.string.replace?view=netframework-4.8 -replace does.

Related

what does this $tok =~ s{\\(.)|([\$\#]|\\$)}{'\\'.($2 || $1)}sge; (perl code) mean?

What does this mean?
$tok =~ s{\\(.)|([\$\#]|\\$)}{'\\'.($2 || $1)}sge;
This comes from a cve study blog which written in Perl. I know this is a regular expression, the content in the second {} should replace that in the first, but I do NOT get what '\\'.($2 || $1)means.
$tok =~ s{\\(.)|([\$\#]|\\$)}{'\\'.($2 || $1)}sge;
It is a substitution operator s/// applied to the string $tok, with the modifiers sge. The delimiters of the operator has been changed from / to {}. Lets break that regex down
s{
\\(.) # (1) match a backslash followed by 1 character, capture
| # (2) or
( # (3) start capture parens
[\$\#] # (4) either a literal $ or #
| # (5) or
\\$ # (6) backslash at the end of line (including newline)
) # end capture parens
}{ # replace with
'\\'.($2 || $1)} # (7) backslash concatenated with either capture 2 or 1
sge; # (8) s = . matches newline, g = match multiple times, e = eval
Judging (at a glance) from the rest of that blog code, this code is not written by someone skilled at Perl. So I will take their comments at face value:
# must protect unescaped "$" and "#" symbols, and "\" at end of string
The eval (8) is apparently to concatenate a backslash with either capture group 2 (2) or 1 (1), depending on which is "true". Or rather, which one matched the string.
Looking closer at the code, (1) and (6) are very similar. The latter one will trigger only at the end of a line that does not have a newline, whereas the first one will handle all other cases, including end of line with a newline (because of /s modifier).
(1) will match any escaped character, so \1, or \$ or \\ anything with a backslash followed by a character. If we look at the replacement part (7), we see that this capture group is the fallback, which will only trigger if the second capture group fails. The second capture group also only matches if the first fails. Confusing? Maybe a little.
(2) triggers if the matching character is not a backslash followed by a character. Now we are looking for a literal $ or #. Or failing that, a backslash at the end of line. But wait a minute, we already checked for backslash? Yes, but this is an edge case.
In the case of (1) matching, $2 will be undefined, and $1, the first capture group, a single character, will be put back into the text. The backslash that was before it will be removed in (1), and then put back in (7). This will not really do anything, just make the regex not destroy already escaped characters.
In the case of (2) matching, it will either be an end of line backslash that is consumed (6) and put back (7), or it will be a $ or # which is consumed (4) and put back (7), with a backslash in front.
So basically what the OP says in the comment is happening.

Find and replace a href value with PowerShell?

I have a HTML file with a load of links in it.
They are in the format
http:/oldsite/showFile.asp?doc=1234&lib=lib1
I'd like to replace them with
http://newsite/?lib=lib1&doc=1234
(1234 and lib1 are variable)
Any idea on how to do that?
Thanks
P
I don't think your examples are correct.
http:/oldsite/showFile.asp?doc=1234&lib=lib1 should be
http:/oldsite/showFile.asp?doc=1234&lib=lib1
and
http://newsite/?lib=lib1&doc=1234 should be http://newsite?lib=lib1&doc=1234
To do the replacement on these, you can do
'http:/oldsite/showFile.asp?doc=1234&lib=lib1' -replace 'http:/oldsite/showFile\.asp\?(doc=\d+)&(lib=\w+)', 'http://newsite?$2&$1'
which returns http://newsite?lib=lib1&doc=1234
To replace these in a file you can use:
(Get-Content -Path 'X:\TheHtmlFile.html' -Raw) -replace 'http:/oldsite/showFile\.asp\?(doc=\d+)&(lib=\w+)', 'http://newsite?$2&$1' |
Set-Content -Path 'X:\TheNewHtmlFile.html'
Regex details:
http:/oldsite/showFile Match the characters “http:/oldsite/showFile” literally
\. Match the character “.” literally
asp Match the characters “asp” literally
\? Match the character “?” literally
( Match the regular expression below and capture its match into backreference number 1
doc= Match the characters “doc=” literally
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
& Match the character “&” literally
( Match the regular expression below and capture its match into backreference number 2
lib= Match the characters “lib=” literally
\w Match a single character that is a “word character” (letters, digits, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
Read in the file, loop through each line and replace the old value with the new value, send the output to the a new file:
gc file.html | % { $_.Replace('oldsite...','newsite...') } | out-file new-file.html

Regex expression for detecting 2 consecutive words when first word starts with #

I wanted to know the regex expression that detects names starting with #. For eg, in the sentence "Hi #Steve Rogers, how are you?", I want to extract out #Steve Rogers using regex. I tried using Pattern.compile("#\\s*(\\w+)").matcher(text), but only "#Steve" get detected. What else should I use.??
Thanks
Try (#[\w\s]+)
It will only capture word and spaces after the #
See example at https://regex101.com/r/4Pv9bu/1
If you don't want to match an # sign followed by a space only like # and if there can be more than a single word after it:
(?<!\S)#\w+(?:\h+\w+)?
Explanation
(?<!\S) Assert a whitespace boundary to the left
# Match literally
\w+ Match 1+ word characters
(?:\s+\w+)? Optionally match 1+ horizontal whitespace chars and 1+ word chars
Regex demo
In Java
String regex = "(?<!\\S)#\\w+(?:\\h+\\w+)?";

How would I change multiple filenames in Powershell?

I am trying to remove parts of a name "- xx_xx" from the end of multiple files. I'm using this and it works well.
dir | Rename-Item -NewName { $_.Name -replace " - xx_xx","" }
However, there are other parts like:
" - yy_yy"
" - zz_zz"
What can I do to remove all of these at once instead of running it again and again changing the part of the name I want removed?
Easiest way
You can keep on stringing -replace statements until the cows come home, if you need to.
$myLongFileName = "Something xx_xx yy_yy zz_zz" -replace "xx_xx","" -replace "yy_yy"
More Terse Syntax
If every file has these, you can also make an array of pieces you want to replace, like this, just separating them with commas.
$stuffWeDontWantInOurFile =#("xx_xx", "yy_yy", "zz_zz")
$myLongFileName -replace $stuffWeDontWantInOurFile, ""
Yet another way
If your file elements are separated by spaces or dashes or something predictable, you can split the file name on that.
$myLongFileName = "Something xx_xx yy_yy zz_zz"
PS> $myLongFileName.Split()
Something
xx_xx
yy_yy
zz_zz
PS> $myLongFileName.Split()[0] #select just the first piece
Something
For spaces, you use the Spit() method with no overload inside of it.
If it were dashes or another character, you'd provide it like so Split("-"). Between these techniques, you should be able to do what you want to do.
If as you say, the pattern - xx_xx is always at the end of the file name, I'd suggest using something like this:
Get-ChildItem -Path '<TheFolderWhereTheFilesAre>' -File |
Rename-Item -NewName {
'{0}{1}' -f ($_.BaseName -replace '\s*-\s*.._..$'), $_.Extension
} -WhatIf
Remove the -WhatIf switch if you are satisfied with the results shown in the console
Result:
D:\Test\blah - xx_yy.txt --> D:\Test\blah.txt
D:\Test\somefile - zy_xa.txt --> D:\Test\somefile.txt
Regex details:
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
- Match the character “-” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
. Match any single character that is not a line break character
. Match any single character that is not a line break character
_ Match the character “_” literally
. Match any single character that is not a line break character
. Match any single character that is not a line break character
$ Assert position at the end of the string (or before the line break at the end of the string, if any)

One regex to capture words separated by one space-character in combination with the opposite capture occurrences more than one space-character

I would like to have just one regex to capture words separated by one space-character in combination with the opposite capture occurrences more than one space-character
I would like to have the following example covered:
This line with sometimes more than 1 space needs to be captured in 3 matches with 2 groups.
I expect the following groups:
([This line with][ ])([sometimes more than][ ])([1][ ])space needs to be captured in 3 matches with 2 groups.
To capture one of the two is no problem.
i.e.
to capture more than one space-char:
([\s]{2,})
and to capture words separated by only one space-char(see https://stackoverflow.com/a/60288115/3710053):
\S+(?:\s\S+)*
You might use an alternation to match either a word followed by a repeating pattern of a single space and a word OR match 2 or more spaces
\S+(?: \S+)*| {2,}
Explanation
\S+ Match 1+ non whitespace chars
(?: \S+)* Repeat 0+ times matching a space and 1+ non whitespace chars
| Or
{2,} Repeat 2 or more times matching a space
Regex demo
If you want to match whitespace chars instead, you could replace the space with \s but note that it could also possibly match newlines.
Edit
For the updated question, you could use 2 capturing groups:
(\S+(?: \S+)*)( {2,})
Explanation
( Capture group 1
\S+ Match 1+ non whitespace chars
(?: \S+)* Repeat 0+ times matching a space and 1+ non whitespace chars
) Close group 1
( Capture group 2
{2,} Match 2 or more spaces
) Close group 2
Regex demo