Extracting substring from a string - powershell

Struggling to extract value within square brackets from below strings using PowerShell
in relation to any Facility C Loan [?10%?] per cent. per annum;
"Facility A Commitments" means the aggregate of the Facility A Commitments, being [????????10 million?????] at the date of this Agreement.
Output required:
10%
10 million

With a single, multiline string in memory (PSv4+):
$str = #'
in relation to any Facility C Loan [?10%?] per cent. per annum;
"Facility A Commitments" means the aggregate of the Facility A Commitments, being [????????10 million?????] at the date of this Agreement.
'#
[regex]::matches($str,'\[\?+([^?]+)\?+\]').ForEach({ $_.Groups[1].Value })
Using the pipeline with Get-Content and Select-String for line-by-line processing (PSv3+):
$lines = #'
in relation to any Facility C Loan [?10%?] per cent. per annum;
"Facility A Commitments" means the aggregate of the Facility A Commitments, being [????????10 million?????] at the date of this Agreement.
'# -split '\r?\n'
# Substitute your `Get-Content someFile.txt` call for $lines
$lines |
Select-String '\[\?+([^?]+)\?+\]' |
ForEach-Object { $_.Matches.Groups[1].Value }
Explanation of regex \[\?+([^?]+)\?+\]:
\[ matches a literal [
\?+ matches one or more (+) literal ?
([^?]+) is a capture group ((...)) that matches one or more (+) characters from the set of characters ([...]) that are not (^) part of the set, i.e., any character that is not the ? character - this is the value of interest to extract.
\?+ matches one or more literal ?
\] matches a literal ]
[regex]::Matches() and the .Matches property on the objects that Select-String emits is a collection of [System.Text.RegularExpressions.Match] objects, whose .Groups property contains both the full match (index 0) and what each capture group captured (1 containing the 1st capture group's value, ...).

This is your regex for both cases:
(?<=\[\?+)[^\?]*(?=\?+\])
You can play with it at https://regex101.com
But this does not support non-fixed width look behinds (the first plus). It should work in .NET/PowerShell though.
This will be good for you:
https://www.regular-expressions.info/lookaround.html

For the first one run:
$message -match '\[\?(\d*%)\?\]'
echo $Matches[1]
For the second one:
\[\?*(\d* million)\?*\]
echo $Matches[1]
In each iteration you can simple as if $message -match '...' returns $True, than check the values inside $Matches variable (this is a system variable to hold the result of the regex.

Related

Powershell - Need to recognize if there is more than one result (regex)

I am using this to find if file name contains exactly 7 digits
if ($file.Name -match '\D(\d{7})(?:\D|$)') {
$result = $matches[1]
}
The problem is when there is a file name that contains 2 groups of 7 digits
for an example:
patch-8.6.22 (1329214-1396826-Increase timeout.zip
In this case the result will be the first one (1329214).
For most cases there is only one number so the regex is working but I must to recognize if there is more than 1 group and integrated into the if ()
The -match operator only ever looks for one match.
To get multiple ones, you must currently use the underlying .NET APIs directly, specifically [regex]::Matches():
Note: There's a green-lighted proposal to implement a -matchall operator, but as of PowerShell 7.3.0 it hasn't been implemented yet - see GitHub issue #7867.
# Sample input.
$file = [pscustomobject] #{ Name = 'patch-8.6.22 (1329214-1396826-Increase timeout.zip' }
# Note:
# * If *nothing* matches, $result will contain $null
# * If *one* substring matches, return will be a single string.
# * If *two or more* substrings match, return will be an *array* of strings.
$result = ([regex]::Matches($file.Name, '(?<=\D)\d{7}(?=\D|$)')).Value
.Value uses member-access enumeration to extract matching substrings, if any, from the elements of the collection returned by [regex]::Matches().
I've tweaked the regex to use lookaround assertions ((?<=/...) and (?=...)) so that only the substrings of interest are captured.
See this regex101.com page for an explanation of the regex and the ability to experiment with it.

Powershell number format

I am creating a script converting a csv file in an another format.
To do so, i need my numbers to have a fixed format to respect column size : 00000000000000000,00 (20 characters, 2 digits after comma)
I have tried to format the number with -f and the method $value.toString("#################.##") without success
Here is an example Input :
4000000
45817,43
400000
570425,02
15864155,69
1068635,69
128586256,9
8901900,04
29393,88
126858346,88
1190011,46
2358411,95
139594,82
13929,74
11516,85
55742,78
96722,57
21408,86
717,01
54930,49
391,13
2118,64
Any hints are welcome :)
Thank you !
tl;dr:
Use 0 instead of # in the format string:
PS> $value = 128586256.9; $value.ToString('00000000000000000000.00')
00000000000128586256.90
Note:
Alternatively, you could construct the format string as an expression:
$value.ToString('0' * 20 + '.00')
The resulting string reflects the current culture with respect to the decimal mark; e.g., with fr-FR (French) in effect, , rather than . would be used; you can pass a specific [cultureinfo] object as the second argument to control what culture is used for formatting; see the docs.
As in your question, I'm assuming that $value already contains a number, which implies that you've already converted the CSV column values - which are invariably strings - to numbers.
To convert a string culture-sensitively to a number, use [double]::Parse('1,2'), for instance (this method too has an overload that allows specifying what culture to use).
Caveat: By contrast, a PowerShell cast (e.g. [double] '1.2') is by design always culture-invariant and only recognizes . as the decimal mark, irrespective of the culture currently in effect.
zerocukor287 has provided the crucial pointer:
To unconditionally represent a digit in a formatted string and default to 0 in the absence of an available digit, use 0, the zero placeholder in a .NET custom numeric format string
By contrast, #, the digit placeholder, represents only digits actually present in the input number.
To illustrate the difference:
PS> (9.1).ToString('.##')
9.1 # only 1 decimal place available, nothing is output for the missing 2nd
PS> (9.1).ToString('.00')
9.10 # only 1 decimal place available, 0 is output for the missing 2nd
Since your input uses commas as decimal point, you can split on the comma and format the whole number and the decimal part separately.
Something like this:
$csv = #'
Item;Price
Item1;4000000
Item2;45817,43
Item3;400000
Item4;570425,02
Item5;15864155,69
Item6;1068635,69
Item7;128586256,9
Item8;8901900,04
Item9;29393,88
Item10;126858346,88
Item11;1190011,46
Item12;2358411,95
Item13;139594,82
Item14;13929,74
Item15;11516,85
Item16;55742,78
Item17;96722,57
Item18;21408,86
Item19;717,01
Item20;54930,49
Item21;391,13
Item22;2118,64
'# | ConvertFrom-Csv -Delimiter ';'
foreach ($item in $csv) {
$num,$dec = $item.Price -split ','
$item.Price = '{0:D20},{1:D2}' -f [int64]$num, [int]$dec
}
# show on screen
$csv
# output to (new) csv file
$csv | Export-Csv -Path 'D:\Test\formatted.csv' -Delimiter ';'
Output in screen:
Item Price
---- -----
Item1 00000000000004000000,00
Item2 00000000000000045817,43
Item3 00000000000000400000,00
Item4 00000000000000570425,02
Item5 00000000000015864155,69
Item6 00000000000001068635,69
Item7 00000000000128586256,09
Item8 00000000000008901900,04
Item9 00000000000000029393,88
Item10 00000000000126858346,88
Item11 00000000000001190011,46
Item12 00000000000002358411,95
Item13 00000000000000139594,82
Item14 00000000000000013929,74
Item15 00000000000000011516,85
Item16 00000000000000055742,78
Item17 00000000000000096722,57
Item18 00000000000000021408,86
Item19 00000000000000000717,01
Item20 00000000000000054930,49
Item21 00000000000000000391,13
Item22 00000000000000002118,64
I do things like this all the time, usually for generating computernames. That custom numeric format string reference will come in handy. If you want a literal period, you have to backslash it.
1..5 | % tostring 00000000000000000000.00
00000000000000000001.00
00000000000000000002.00
00000000000000000003.00
00000000000000000004.00
00000000000000000005.00
Adding commas to long numbers:
psdrive c | % free | % tostring '0,0' # or '#,#'
18,272,501,760
"Per mille" character ‰ :
.00354 | % tostring '#0.##‰'
3.54‰

need to find string using from powershell

in a text something like this, I need to be able to read the project code which is unique per text file.
devices :
meta : #{Projectcode=rvmf99999}
public_keys : #{Key=ssh-
select-string -pattern "rvmf" picks up the whole line, I just need rvmf and the digits after that.
# Sample input
$txt = #'
devices :
meta : #{Projectcode=rvmf99999}
public_keys : #{Key=ssh-
'#
$txt | Select-String 'rvmf\d+' | foreach { $_.Matches[0].Value } # -> 'rvmf99999'
Regex 'rvmf\d+' captures substring 'rvmf' followed by 1 or more (+) digits (\d).
The object output by Select-String has a .Matches property whose first entry's .Value property contains what the regex captured.
Specifically, the output objects are of type Microsoft.PowerShell.Commands.MatchInfo, which contains the input line (property .Line) as well as metadata about the source of the line and details about the regex-matching operation in the .Matches property.
Specifically, the .Matches property contains a collection of match-information objects; unless -AllMatches was passed to Select-Object, there will only be one element, however.
Each element of the .Matches collection is a System.Text.RegularExpressions.Match instance, whose .Value property contains what the regex captured as a whole.
Note: There is an upcoming feature - green-lighted, but not yet implemented as of PowerShell Core 7.0.0-preview.5 - that will greatly simplify the command:
# NOT YET IMPLEMENTED as of PowerShell Core 7.0.0-preview.5
$txt | Select-String 'rvmf\d+' -OnlyMatching # -> 'rvmf99999'
-OnlyMatching will only output the part of the line that was matched.

Pad IP addresses with leading 0's - powershell

I'm looking to pad IP addresses with 0's
example
1.2.3.4 -> 001.002.003.004
50.51.52.53 -> 050.051.052.053
Tried this:
[string]$paddedIP = $IPvariable
[string]$paddedIP.PadLeft(3, '0')
Also tried split as well, but I'm new to powershell...
You can use a combination of .Split() and -join.
('1.2.3.4'.Split('.') |
ForEach-Object {$_.PadLeft(3,'0')}) -join '.'
With this approach, you are working with strings the entire time. Split('.') creates an array element at every . character. .PadLeft(3,'0') ensures 3 characters with leading zeroes if necessary. -join '.' combines the array into a single string with each element separated by a ..
You can take a similar approach with the format operator -f.
"{0:d3}.{1:d3}.{2:d3}.{3:d3}" -f ('1.2.3.4'.Split('.') |
Foreach-Object { [int]$_ } )
The :dN format string enables N (number of digits) padding with leading zeroes.
This approach creates a string array like in the first solution. Then each element is pipelined and converted to an [int]. Lastly, the formatting is applied to each element.
To complement AdminOfThings' helpful answer with a more concise alternative using the -replace operator with a script block ({ ... }), which requires PowerShell Core (v6.1+):
PSCore> '1.2.3.50' -replace '\d+', { '{0:D3}' -f [int] $_.Value }
001.002.003.050
The script block is called for every match of regex \d+ (one or more digits), and $_ inside the script block refers to a System.Text.RegularExpressions.Match instance that represents the match at hand; its .Value property contains the matched text (string).

Returning the whole string when no match in a Powershell Substring(0, IndexOf)

I have some Powershell that works with mail from Outlook folders. There is a footer on most emails starting with text "------". I want to dump all text after this string.
I have added an expression to Select-Object as follows:
$cleanser = {($_.Body).Substring(0, ($_.Body).IndexOf("------"))}
$someObj | Select-Object -Property #{ Name = 'Body'; Expression = $cleanser}
This works when the IndexOf() returns a match... but when there is no match my Select-Object outputs null.
How can I update my expression to return the original string when IndexOf returns null?
PetSerAl, as countless times before, has provided the crucial pointer in a comment on the question:
Use PowerShell's -replace operator, which implements regex-based string replacement that returns the input string as-is if the regex doesn't match:
# The script block to use in a calculated property with Select-Object later.
$cleanser = { $_.Body -replace '(?s)------.*' }
If you want to ensure that ------ only matches at the start of a line, use (?sm)^------.*; if you also want to remove the preceding newline, use (?s)\r?\n------.*
(?s) is an inline regex option that makes . match newlines too, so that .* effectively matches all remaining text, across lines.
By not specifying a replacement operand, '' (the empty string) is implied, which effectively removes the matching part from the input string (technically, a copy of the original string with the matching part removed is returned).
If regex '(?s)------.*' does not match, $_.Body is returned as-is (technically, it is the input string itself that is returned, not a copy).
The net effect is that anything starting with ------ is removed, if present.
I agree with #mklement0 and #PetSerAl Regular Expressions give the best answer. Yay! Regular Expressions to the rescue!
Edit:
Fixing my original post.
Going with #Adam's ideas of using a script block in the expression, you simply need to add more logic to the script block to check the index first before using it:
$cleanser = {
$index = ($_.Body).IndexOf("------");
if($index -eq -1){
$index = $_.Body.Length
};
($_.Body).Substring(0, $index)
}
$someObj | Select-Object -Property #{ Name = 'Body'; Expression = $cleanser}