How do I match "|" in a regular expression in PowerShell? - powershell

I want to use a regular expression to filter out if a string contains one of "&" or "|" or "=". I tried:
$compareRegex = [String]::Join("|", #("&","|", "="));
"mydfa" -match $compareStr
PowerShell prints "True". This is not what I wanted, and it seems "|" itself has confused PowerShell for a matching. How do I fix it?

#Kayasax answer would do in this case (thus +1), just wanted to suggest more general solution.
First of all: you are not using the pattern that you've just created. I suspect $compareStr is $null, thus it will match anything.
To the point: if you want to create pattern that will match characters/strings and you can't predict if any of them will be/contain special character or not, just use [regex]::Escape() for any item you want to match against:
$patternList = "&","|", "=" | ForEach-Object { [regex]::Escape($_) }
$compareRegex = $patternList -join '|'
"mydfa" -match $compareRegex
In such a case input can be dynamic, and you won't end up with pattern that matches anything.

The | has a special meaning in regular expressions. Alternations (lists of alternative matches) are separated by this character. For instance, the expression
a|b|c
would match either a or b or c.
For matching a literal | you need to escape it with backslash (\|) or put it in a character class ([|]), so your expression should look like this:
"mydfa" -match "\||&|="
or like this:
"mydfa" -match "[|&=]"

Related

Parse info from Text File - Powershell

Beginner here, I am working on a error log file and library, the current step I am on is to pull specific information from a txt file.
The code I have currently is...
$StatusErr = "Type 1","Type 2"
for ($i=0; $i -lt $StatusErr.length; $i++) {
get-content C:\blah\Logs\StatusErrors.TXT |
select-string $StatusErr[$i] |
add-content C:\blah\Logs\StatusErrorsresult.txt
}
while it is working, I need it to display as
Type-1-Description
2-Description
Type-1-Description
2-Description
Type-1-Description
2-Description
etc.
it is currently displaying as
Type 1 = Type-1-Description
Type 1 = Type-1-Description
Type 1 = Type-1-Description
Type 2 = 2-Description
Type 2 = 2-Description
Type 2 = 2-Description
I am unsure how to change the arrangement and remove unneeded spaces and the = sign
You need to search for both patterns in a single Select-String call in order to get matching lines in order.
While the -Pattern parameter does accept an array of patterns, in this case a single regex will do.
You need to use a regex pattern in order to capture and output only part of the lines that match.
$StatusErrRegex = '(?<=Type [12]\s*=\s*)[^ ]+'
get-content C:\blah\Logs\StatusErrors.TXT |
select-string $StatusErrRegex |
foreach-object { $_.Matches.Value } |
set-content C:\blah\Logs\StatusErrorsresult.txt
Note that I've replaced add-content with set-content, as I'm assuming you don't want to append to a preexisting file. set-content writes all objects it receives via the pipeline to the output file.
Select-String outputs Microsoft.PowerShell.Commands.MatchInfo instances whose .Matches property provides access to the part of the line that was matched.
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
Additional notes:
Select-String, like PowerShell in general, is case-insensitive by default; add the -CaseSensitive switch, if needed.
(?<=...) is a (positive) lookbehind assertion, whose matching text doesn't became part of what the regex captures.
\s* matches zero or more whitespace characters; \s+ would match one or more.
[^ ]+ matches one or more (+) characters that are not ^ spaces ( ), and thereby captures the run of non-space characters to the right of the = sign.
To match any of multiple words at the start of the pattern, use a regex alternation (|), e.g. '(?<=(type|data) [12]\s*=\s*)[^ ]+'

To check for special characters in a string

I need to find special character in a string which has alphanumeric values in it.
I have tried the below code snippet but it doesn't work.
$Special_characters = ('\n|\r|\t|\a|\"|\`')
$Value = "g63evsy3swisnwhd83bs3hs9sn329hs\t"
if($Value -match $Special_characters)
{
Write-Host "Special characters are present"
}
else
{
Write-Host "special characters are absent"
}
The output says "special characters are absent" even though there are special characters at the end. How to resolve it?
$Special_Characters here is a string, so your code is searching for the whole word (\n|\r|\t|\a|\"|`) to be found in $Value, which is not found.
Instead of string, you have to use array as follows:
$Value = "g63evsy3swisnwhd83bs3hs9sn329hs\t"
$Special_Characters = #('\\n','\\r','\\t','\\a','\\"','\\`')
$Special_Characters | Foreach-Object {
if ($Value -match $_) {
"$_ is present"
} else {
"$_ is not present"
}
}
Note
You have to put double back-slash (\\) because backslash is considered as escape character in Powershell; Look here for further information about backslash in Powershell
There is a misunderstanding here.
The backslash is used to define a special character in a Regular Expression, as e.g. \t define a tab.
But this is not the case for PowerShell. To define a special character in PowerShell you need to use the backtick character (See: About Special Characters), e.g. a Tab is written as `t.
In other words, the regular expression pattern in the question is correct but the input string is not (in contrast to what the question/title suggests, there is in fact no special character in the given input string").
it should be:
"...hs9sn329hs`t" -match '\n|\r|\t|\a|\"|\`'
True
As it concerns a list of single (special) characters, you might also consider a bracket expression (rather than an OR "pipe" character) for this:
"...hs9sn329hs`t" -match '[\n\r\t\a\"\`]'
True
Visa versa: it is allowed to use special characters in a regular expression pattern using double quotes so that PowerShell will evaluate the string (but I recommend against this):
"...hs9sn329hs`t" -match "`n|`r|`t|`a|`"|``"
True
If the input string in the question is really the string you want to check upon (implying that you refer to the backslash as a special character, which formally is not), you want to check for a \t rather than a tab,. For this you will need to escape the backslashes in your regular expression to literally match the \t:
"...hs9sn329hs\t" -match '\\n|\\r|\\t|\\a|\\"|\\`'
True
Its an one-liner:
$Special_characters = '\n|\r|\t|\a|\"|\`'
$Value = "g63evsy3swisnwhd83bs3hs9sn329hs\t"
$result = #($Special_characters -split '\|' | % { $Value.Contains( $_ ) }).Contains( $true )
$result is true when a special character is found, otherwise false.
Here's all the special characters you referred to. You can try out a string by itself just to see if it works. It must be double quoted.
PS /Users/js> "`n`r`t`a`"``"
"`
You can also try out the -match operator by itself.
PS /Users/js> "`n`r`t`a`"``" -match '\n|\r|\t|\a|\"|\`'
True
About special characters: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_special_characters?view=powershell-6

Split & Trim in a single step

In PS 5.0 I can split and trim a string in a single line, like this
$string = 'One, Two, Three'
$array = ($string.Split(',')).Trim()
But that fails in PS 2.0. I can of course do a foreach to trim each item, or replace ', ' with ',' before doing the split, but I wonder if there is a more elegant approach that works in all versions of PowerShell?
Failing that, the replace seems like the best approach to address all versions with a single code base.
TheMadTechnician has provided the crucial pointer in a comment on the question:
Use the -split operator, which works the same in PSv2: It expects a regular expression (regex) as the separator, allowing for more sophisticated tokenizing than the [string] type's .Split() method, which operates on literals:
PS> 'One, Two, Three' -split ',\s*' | ForEach-Object { "[$_]" }
[One]
[Two]
[Three]
Regex ,\s* splits the input string by a comma followed by zero or more (*) whitespace characters (\s).
In fact, choosing -split over .Split() is advisable in general, even in later PowerShell versions.
However, to be fully equivalent to the .Trim()-based solution in the question, trimming of leading and trailing whitespace is needed too:
PS> ' One, Two,Three ' -split ',' -replace '^\s+|\s+$' | ForEach-Object { "[$_]" }
[One]
[Two]
[Three]
-replace '^\s+|\s+$' removes the leading and trailing whitespace from each token resulting from the split: | specifies an alternation so that the subexpressions on either side of it are considered a match; ^\s+, matches leading whitespace, \s+$ matches trailing whitespace; \s+ represents a non-empty (one or more, +) run of whitespace characters; for more information about the -replace operator, see this answer.
(In PSv3+, you could simplify to (' One, Two,Three ' -split ',').Trim() or use the solution from the question.
To also weed out empty / all-whitespace elements, append -ne '')
As for why ('One, Two, Three'.Split(',')).Trim() doesn't work in PSv2: The .Split() method returns an array of tokens, and invoking the .Trim() method on that array - as opposed to its elements - isn't supported in PSv2.
In PSv3+, the .Trim() method call is implicitly "forwarded" to the elements of the resulting array, resulting in the desired trimming of the individual tokens - this feature is called member-access enumeration.
I don't have PS 2.0 but you might try something like
$string = 'One, Two, Three'
$array = ($string.Split(',') | % { $_.Trim() })
and see if that suits. This is probably less help for you but for future readers who have moved to future versions you can use the #Requires statement. See help about_Requires to determine if your platforms supports this feature.

How to -replace continuous special characters?

In PowerShell when trying to replace
"Columnname1||colunnname2||kjhsadjhj|kjsad" -replace "[||]", "','"
above command is doing
"Columnname1','','colunnname2','','kjhsadjhj','kjsad"
but I'd like to replace the exact match like below
"Columnname1','colunnname2','kjhsadjhj|kjsad"
Your code doesn't do what you want because your search pattern defines a character class. Square brackets in a regular expression will match exactly one occurrence of any of the enclosed characters, even if you specify a character multiple times. [||] will thus match exatly one | character.
Since you apparently don't actually want to use a regular expression match I'd recommend doing a normal string replacement via the Replace() method rather than a regular expression replacement via the -replace operator:
"Columnname1||colunnname2||kjhsadjhj|kjsad".Replace('||', "','")
If you want to stick with a regular expression replacement you must specify two literal | characters, either by escaping them, as PetSerAl suggested
"Columnname1||colunnname2||kjhsadjhj|kjsad" -replace '\|\|', "','"
or by putting each of them in its own character class
"Columnname1||colunnname2||kjhsadjhj|kjsad" -replace '[|][|]', "','"
The regex pattern [||] means "1 of | or one of |"
Change it to \|{2} to match two consecutive pipes:
"Columnname1||colunnname2||kjhsadjhj|kjsad" -replace "\|{2}", "','"
Just in case your end result should be:
'Columnname1','colunnname2','kjhsadjhj|kjsad'
$string = '"Columnname1||colunnname2||kjhsadjhj|kjsad"'
$string
$String = $string -replace '^"|"$',"'" -replace '\|{2}',"','"
$string
Sample output:
"Columnname1||colunnname2||kjhsadjhj|kjsad"
'Columnname1','colunnname2','kjhsadjhj|kjsad'

How can I replace every comma with a space in a text file before a pattern using PowerShell

I have a text file with lines in this format:
FirstName,LastName,SSN,$x.xx,$x.xx,$x.xx
FirstName,MiddleInitial,LastName,SSN,$x.xx,$x.xx,$x.xx
The lines could be in either format. For example:
Joe,Smith,123-45-6789,$150.00,$150.00,$0.00
Jane,F,Doe,987-65-4321,$250.00,$500.00,$0.00
I want to basically turn everything before the SSN into a single field for the name thus:
Joe Smith,123-45-6789,$150.00,$150.00,$0.00
Jane F Doe,987-65-4321,$250.00,$500.00,$0.00
How can I do this using PowerShell? I think I need to use ForEach-Object and at some point replace "," with " ", but I don't know how to specify the pattern. I also don't know how to use a ForEach-Object with a $_.Where so that I can specify the "SkipUntil" mode.
Thanks very much!
Mathias is correct; you want to use the -replace operator, which uses regular expressions. I think this will do what you want:
$string -replace ',(?=.*,\d{3}-\d{2}-\d{4})',' '
The regular expression uses a lookahead (?=) to look for any commas that are followed by any number of any character (. is any character, * is any number of them including 0) that are then followed by a comma immediately followed by a SSN (\d{3}-\d{2}-\d{4}). The concept of "zero-width assertions", such as this lookahead, simply means that it is used to determine the match, but it not actually returned as part of the match.
That's how we're able to match only the commas in the names themselves, and then replace them with a space.
I know it's answered, and neatly so, but I tried to come up with an alternative to using a regex - count the number of commas in a line, then replace either the first one, or the first two, commas in the line.
But strings can't count how many times a character appears in them without using the regex engine(*), and replacements can't be done a specific number of times without using the regex engine(**), so it's not very neat:
$comma = [regex]","
Get-Content data.csv | ForEach {
$numOfCommasToReplace = $comma.Matches($_).Count - 4
$comma.Replace($_, ' ', $numOfCommasToReplace)
} | Out-File data2.csv
Avoiding the regex engine entirely, just for fun, gets me things like this:
Get-Content .\data.csv | ForEach {
$1,$2,$3,$4,$5,$6,$7 = $_ -split ','
if ($7) {"$1 $2 $3,$4,$5,$6,$7"} else {"$1 $2,$3,$4,$5,$6"}
} | Out-File data2.csv
(*) ($line -as [char[]] -eq ',').Count
(**) while ( #counting ) { # split/mangle/join }