I have a string which looks like this:
[Error] Failed to process site: https://xxxxxxxxx/teams/xxxxxx. The remote server returned an error: (404) Not Found.
The challenge is to extract highlighted part of this string.
Tried some split operation but without any success.
Use the -match operator to perform a regex search for the URL, then extract the matched string value:
# define input string
$errorString = '[Error] Failed to process site: https://some.domain.tld/teams/xxxxxx. The remote server returned an error: (404) Not Found.'
# define regex pattern for a URL followed by a literal dot
$urlFollowedByDotPattern = 'https?://(?:www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()#:%_\+.~#?&//=]*)(?=\.)'
# perform match comparison
if($errorString -match $urlFollowedByDotPattern){
# extract the substring containing the URL matched by the pattern
$URL = $Matches[0]
# remove everything before the last slash
$pathSuffix = $URL -replace '^.*/'
}
If the URL was found, the $pathSuffix variable now contains the trailing xxxxxx part
To offer a concise alternative to Mathias R. Jessen's helpful answer using the regex-based -replace operator:
-replace can be special-cased for substring extraction by:
formulating a regex that matches the entire input string
using a capture group ((...)) inside that regex to capture the substring of interest and using that as the replacement string; e.g., $1 in the replacement string refers to what the first capture group captured.
$str = '[Error] Failed to process site: https://xxxxxxxxx/teams/xxxxxx. The remote server returned an error: (404) Not Found.'
$str -replace '.+https://.+/([^/]+)\..+', '$1' # -> 'xxxxxx'
For an explanation of the regex and the option to experiment with it, see this regex101.com page.
Note:
If the regex does not match, the whole input string is returned as-is.
If you'd rather return an empty string in that case, use a technique suggested by zett42: append |.* to the regex, which alternatively (|), if the original regex didn't match, unconditionally matches the whole input string with .*, in which case - since no capture group is then used - $1 evaluates to the empty string as the effective return value:
# Note the addition of |.*
'this has no URL' -replace '.+https://.+/([^/]+)\..+|.*', '$1' # -> ''
regex101.com page.
If such a regex is too mind-bending, you can try a multi-step approach that combines -split and -like:
$str = '[Error] Failed to process site: https://xxxxxxxxx/teams/xxxxxx. The remote server returned an error: (404) Not Found.'
# -> 'xxxxxx'
# Note the line continuations (` at the very end of the lines)
# required to spread the command across multiple lines for readability.
(
-split $str <# split into tokens by whitespace #> `
-like 'https://*' <# select the token that starts with 'https://' #> `
-split '/' <# split it into URL components by '/' #>
)[-1]?.TrimEnd('.') <# select the last component and trim the traiing "." #>
Note:
The use ?., the null-conditional member-access operator, prevents a statement-terminating error if no token of interest is found ($null is returned instead), but it requires PowerShell (Core) 7.1+.
In earlier versions or alternatively, you can replace ?.TrimEnd('.') with -replace '\.$', which defaults to an empty string instead.
Related
I need to extract a list with strings that are between two special characters (= and ;).
Below is an example of the file with line types and the needed strings in bold.
File is a quite big one, type is xml.
<type="string">data source=**HOL4624**;integrated sec>
<type="string">data source=**HOL4625**;integrated sec>
I managed to find the lines matching “data source=”, but how to get the name after?
Used code is below.
Get-content regsrvr.txt | select-string -pattern "data source="
Thank you very much!
<RegisteredServers:ConnectionStringWithEncryptedPassword type="string">data source=HOL4624;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096</RegisteredServers:ConnectionStringWithEncryptedPassword>
<RegisteredServers:ConnectionStringWithEncryptedPassword type="string">data source=HOL4625;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096</RegisteredServers:ConnectionStringWithEncryptedPassword>
The XML is not valid, so it's not a clean parse, anyway you can use string split with regex match:
$html = #"
<RegisteredServers:ConnectionStringWithEncryptedPassword type="string">data source=HOL4624;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096</RegisteredServers:ConnectionStringWithEncryptedPassword>
<RegisteredServers:ConnectionStringWithEncryptedPassword type="string">data source=HOL4625;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096</RegisteredServers:ConnectionStringWithEncryptedPassword>
"#
$html -split '\n' | % {$null = $_ -match 'data source=.*?;';$Matches[0]} |
% {($_ -split '=')[1] -replace ';'}
HOL4624
HOL4625
Since the connectionstring is for SQL Server, let's use .Net's SqlConnectionStringBuilder to do all the work for us. Like so,
# Test data, XML extraction is left as an exercise
$str = 'data source=HOL4624;integrated security=True;pooling=False;multipleactiveresultsets=False;connect timeout=30;encrypt=False;trustservercertificate=False;packet size=4096'
$builder = new-object System.Data.SqlClient.SqlConnectionStringBuilder($str)
# Check some parameters
$builder.DataSource
HOL4624
$builder.IntegratedSecurity
True
You can expand your try at using Select-String with a better use of regex. Also, you don't need to use Get-Content first. Instead you can use the -Path parameter of Select-String.
The following Code will read the given file and return the value between the = and ;:
(Select-String -Path "regsrvr.txt" -pattern "(?:data source=)(.*?)(?:;)").Matches | % {$_.groups[1].Value}
Pattern Explanation (RegEx):
You can use -pattern to capture an String given a matching RegEx. The Regex can be describe as such:
(?: opens an non-capturing Group
data source= matches the charactes data source=
) closes the non-capturing Group
(.*?) matches any amount of characters and saves them in a Group. The ? is the lazy operator. This will stop the matching part at the first occurence of the following group (in this case the ;).
(?:;) is the final non-capturing Group for the closing ;
Structuring the Output
Select-String returns a Microsoft.PowerShell.Commands.MatchInfo-Object.
You can find the matched Strings (the whole String and all captured groups) in there. We can also loop through this Output and return the Value of the captured Groups: | % {$_.groups[1].Value}
% is just an Alias for For-Each.
For more Informations look at the Select-String-Documentation and try your luck with some RegEx.
I am facing problem while using multiple patterns to retrieve the values from the second pattern.
The file contains data like below
Explore/CommonServices/AlertService.Folder
Explore/CommonServices/CIHLogger.Folder
What I am trying to do is find the text between two forward slashes i.e CommonServices and find the text between one forward slash and the dot i.e AlertService
I am able to find them using the patterns '/(.+)/' and '/([^/]+)\.' respectively. Now the challenge is how to get them into a single line
My Command is
((get-content "test2.txt") | Select-String -pattern '/(.+)/','/([^/]+)\.' -context 0,2 | foreach {"iics export --podHostName dm-us.informaticacloud
.com -r us -u xxxxxx -p xxxxxxxxx --artifacts " + $_ + " --zipFilePath `"C:\Users\breddy002\Documents\NJR\SVN\" + $_.Matches[0].Groups[1].Value + "\" + $($_.Matches[1].Groups[1].value)
})
I am not sure how to get the second pattern output in the second output
Powershell version is : echo $PSVersionTable
Name Value
---- -----
PSVersion 5.1.17763.771
You could do something like the following:
switch -regex -file 'text2.txt' {
'/(?<Slash>[^/]+)/(?<Dot>[^/\.]+)\.' {
$Slash = $Matches.Slash
$Dot = $Matches.Dot
"First: {0}, Second: {1}" -f $Slash,$Dot
}
}
Explanation:
The switch statement is a robust if statement. With the -regex and -file parameters, you can efficiently read in each line of a file and apply regex pattern matches. Each successful match is stored in the automatic variable $Matches.
The regex pattern /(?<Slash>[^/]+)/(?<Dot>[^/\.]+)\. matches as follows:
/ is a literal match of /
(?<Slash>[^/]+) matches one or more (+) characters that are not / ([^/]). That match is stored as capture group Slash (using syntax (?<Slash>)). It can later be accessed by using the syntax $Matches.Slash.
(?<Dot>[^/\.]+) matches one or more characters that are not / and . ([^/\.]+). That match is stored as capture group Dot. It can later be accessed by using the syntax $Matches.Dot.
\. is a literal . match, which requires escaping with backslash.
I have a filepath, and I'm trying to remove the last two occurrences of the / character into . and also completely remove the '{}' via Powershell to then turn that into a variable.
So, turn this:
xxx-xxx-xx\xxxxxxx\x\{xxxx-xxxxx-xxxx}\xxxxx\xxxxx
Into this:
xxx-xxx-xx\xxxxxxx\x\xxxx-xxxxx-xxxx.xxxxx.xxxxx
I've tried to get this working with the replace cmdlet, but this seems to focus more on replacing all occurrences or the first/last occurrence, which isn't my issue. Any guidance would be appreciated!
Edit:
So, I have an excel file and i'm creating a powershell script that uses a for each loop over every row, which amounts to thousands of entries. For each of those entries, I want to create a secondary variable that will take the full path, and save that path minus the last two slashes. Here's the portion of the script that i'm working on:
Foreach($script in $roboSource)
{
$logFileName = "$($script.a).txt".Replace('(?<=^[^\]+-[^\]+)-','.')
}
$script.a will output thousands of entries in this format:
xxx-xxx-xx\xxxxxxx\x{xxxx-xxxxx-xxxx}\xxxxx\xxxxx
Which is expected.
I want $logFileName to output this:
xxx-xxx-xx\xxxxxxx\x\xxxx-xxxxx-xxxx.xxxxx.xxxxx
I'm just starting to understand regex, and I believe the capture group between the parenthesis should be catching at least one of the '\', but testing attempts show no changes after adding the replace+regex.
Please let me know if I can provide more info.
Thanks!
You can do this in two fairly simply -replace operations:
Remove { and }
Replace the last two \:
$str = 'xxx-xxx-xx\xxxxxxx\x\{xxxx-xxxxx-xxxx}\xxxxx\xxxxx'
$str -replace '[{}]' -replace '\\([^\\]*)\\([^\\]*)$','.$1.$2'
The second pattern matches:
\\ # 1 literal '\'
( # open first capture group
[^\\]* # 0 or more non-'\' characters
) # close first capture group
\\ # 1 literal '\'
( # open second capture group
[^\\]* # 0 or more non-'\' characters
) # close second capture group
$ # end of string
Which we replace with the first and second capture group values, but with . before, instead of \: .$1.$2
If you're using PowerShell Core version 6.1 or newer, you can also take advantage of right-to-left -split:
($str -replace '[{}]' -split '\\',-3) -join '.'
-split '\\',-3 has the same effect as -split '\\',3, but splitting from the right rather than the left.
A 2-step approach is simplest in this case:
# Input string.
$str = 'xxx-xxx-xx\xxxxxxx\x\{xxxx-xxxxx-xxxx}\xxxxx\xxxxx'
# Get everything before the "{"
$prefix = $str -replace '\{.+'
# Get everything starting with the "{", remove "{ and "}",
# and replace "\" with "."
$suffix = $str.Substring($prefix.Length) -replace '[{}]' -replace '\\', '.'
# Output the combined result (or assign to $logFileName)
$prefix + $suffix
If you wanted to do it with a single -replace operation (with nesting), things get more complicated:
Note: This solution requires PowerShell Core (v6.1+)
$str -replace '(.+)\{(.+)\}(.+)',
{ $_.Groups[1].Value + $_.Groups[2].Value + ($_.Groups[3].Value -replace '\\', '.') }
Also see the elegant PS-Core-only -split based solution with a negative index (to split only a fixed number of tokens off the end) in Mathias R. Jessen's helpful answer.
try this
$str='xxx-xxx-xx\xxxxxxx\x\{xxxx-xxxxx-xxxx}\xxxxx\xxxxx'
#remove bracket and split for get array
$Array=$str -replace '[{}]' -split '\\'
#take all element except 2 last elements, and concat after last elems
"{0}.{1}.{2}" -f ($Array[0..($Array.Length -3)] -join '\'), $Array[-2], $Array[-1]
Below is the code where I am taking server names from a text file and concatenating with comma.
But when I am printing the value, it is coming with an extra new line after the values.
I tried doing $erversToReboot.Trim(), but didn't helped.
$ServerList = Get-Content "D:\ServerName.txt"
$Servers=""
foreach($Server in $ServerList)
{
$Servers += $Server + ","
}
[string]$ServersToReboot= $Servers.TrimEnd(",")
The output coming as
server1,server2
---one extra line here---
Please let me know what is going wrong here.
Best as I can tell, you're attempting to comma separate your servers. I'd skip the Foreach construct myself and simply use the join operator.
$ServerList = Get-Content -Path 'D:\ServerName.txt'
$ServerList -join ','
This can be done in a single statement, as well.
$ServerList = (Get-Content -Path 'D:\ServerName.txt') -join ','
Tommy
As others have noted, it's in general much simpler to use the -join operator to join the input lines with a specifiable separator.
As for the problem of an extra empty line: Gert Jan Kraaijeveld plausibly suggests that your input file has an extra empty line at the end, while noting that it is actually not what would happen with the code you've posted, which should work fine (despite its inefficiency).
Perhaps the extra line is an artifact of how you're printing the resulting value.
To answer the related question of how to ignore empty lines in the input file:
Assuming that it is OK to simply remove all empty lines from the input, the simplest PowerShell-idiomatic solution is:
#(Get-Content D:\ServerName.txt) -ne '' -join ','
#(Get-Content D:\ServerName.txt) returns the input lines as an array[1] of strings, from which -ne '' then removes empty lines, and the result of which -join joins with separator ,
[1] Get-Content D:\ServerName.txt would return a scalar (single string), if the input file happened to contain only 1 line, because PowerShell generally reports a single output object as itself rather than as a single-element array when pipeline output is collected.
Because of that, #(...), the array-subexpression operator - instead of just (...) - is needed in the above command: it ensures that the output from Get-Command is treated as an array, because the -ne operator acts differently with a scalar LHS and returns a Boolean rather than filtering the LHS's elements: compare 'foo' -ne '' to #('foo') -ne ''.
By contrast, the #(...) is not necessary if you pass the result (directly) to -join (which simply is a no-op with a scalar LHS):
(Get-Content D:\ServerName.txt) -join ','
My powershell command below
$BUILD_SOURCEVERSIONMESSAGE= (Get-Item Env:\BUILD_SOURCEVERSIONMESSAGE)
returns output in this format
2018-10-26T01:08:44.7409834Z BUILD_SOURCEVERSIONMESSAGE Merge 569594f057e2c4bd0320159855e81e14216ca66f into 41107d0f0db5ef2986831db2182280e0c...
I am trying to parse the string 569594f057e2c4bd0320159855e81e14216ca66f from the output above.
I tried converting the output to a string, splitting it on whitespace, and accessing the second element of the array as follows. However, I get empty string. How can I access the required string?
echo $BUILD_SOURCEVERSIONMESSAGE
$out = $BUILD_SOURCEVERSIONMESSAGE | Out-String
$out1 = $out.split()
echo $out1[1]
The concise equivalent of command Get-Item Env:\BUILD_SOURCEVERSIONMESSAGE - i.e., retrieving the value of environment variable BUILD_SOURCEVERSIONMESSAGE - is the expression $env:BUILD_SOURCEVERSIONMESSAGE.
Using the unary form of Powershell's -split operator, which splits the input by any nonempty run of whitespace (while stripping leading and trailing whitespace), you can get the desired output as follows:
PS> (-split $env:BUILD_SOURCEVERSIONMESSAGE)[3]
569594f057e2c4bd0320159855e81e14216ca66f
Index 3 extracts the 4th token resulting from the tokenization via -split.
If you want to use string interpolation with the result:
$prefix = 'before<'; $postfix = '>after'
$val = (-split $env:BUILD_SOURCEVERSIONMESSAGE)[3]
# Output a synthesized string that applies a pre- and postfix, using
# {...} to enclose variable names to avoid ambiguity.
"${prefix}${val}${postfix}"
The above yields:
before<569594f057e2c4bd0320159855e81e14216ca66f>after