Splitting a string and selecting a substring in PowerShell - powershell

I am attempting to isolate and return a small variable string from a larger string.
I am struggling because the larger string I am extracting from is in list format. I can split this into substrings successfully, but I do not know how to select one of these substrings without returning the entire string. The string is generated by a command line process.
$StringList
AppTitle1.1.1221.aaa111
AppSubTitle
AnotherAppTitle1.1.1221.aaa111
AnotherAppSubTitle
...and so on
I can split the list string into substrings by line using regular expressions to split at whitespace (there is no whitespace within any given line).
$StringList -split "\s"
Once I have split the string into the desired substrings, however, I am not sure how to select the desired substring. The length of the list (i.e. the number of apps present in it) and the location of the app I need to retrieve the title of within that list are entirely variable, so I cannot simply use substring reference numbers. I've tried several approaches to selecting the substring, but each has simply returned the entire string, or nothing at all.
Here are two approaches I've attempted. The first returns the entire string list and the second returns nothing.
$DesiredAppTitle = Select-String -InputObject $StringList -Pattern "AnotherAppTitle"
or
$DesiredAppTitle = foreach ($_.substring in $StringList)
{
if ($_.substring -contains "AnotherAppTitle")
{
return $_.name
}
}
What I'd like for it to return is:
AnotherAppTitle1.1.1221.aaa111
I'm sure there are a million ways to do this, so if neither of my approaches seems like a good fit, I'm open to other suggestions. Any assistance would be greatly appreciated. Thanks in advance!

# Multi-line input string.
$StringList = #'
AppTitle1.1.1221.aaa111
AppSubTitle
AnotherAppTitle1.1.1221.aaa111
AnotherAppSubTitle
'#
# Split it into whitespace-separated tokens.
$tokens = -split $StringList
# Match the token of interest.
$tokens -match '^AnotherAppTitle'
The above yields:
AnotherAppTitle1.1.1221.aaa111
Note the use of regex-matching operator with anchor ^ to ensure that the search term matches at the start of a token, and the use of the unary form of the -split operator, which splits the input by any nonempty whitespace runs.
As for what you tried:
If you pass a multi-line string to Select-String, it is considered a single "line" and, in case of a match, that whole "line" is output.
foreach ($_.substring in $StringList) won't even run, because $_.substring is not a valid iteration variable (you shouldn't use $_, which is an automatic variable, as an enumeration variable at all, and the .substring access breaks the syntax).
If you used $_ instead of $_.substring, the loop would technically work (even though, again, $_ shouldn't be used as an iteration variable), but the loop would only execute once, for the entire multi-line string.
Even if $_.substring did refer to a line (it doesn't), -contains is the wrong operator to use, because it tests if a LHS collection contains the RHS value in full.
Also, use break to exit a loop, not return.
Using the -match approach as demonstrated at the top is the better approach, but if you did want to solve this with a foreach loop:
$DesiredAppTitle = foreach ($token in -split $StringList) {
if ($token -match '^AnotherAppTitle') { $token; break }
}

Related

How to pipe results into output array

After playing around with some powershell script for a while i was wondering if there is a version of this without using c#. It feels like i am missing some information on how to pipe things properly.
$packages = Get-ChildItem "C:\Users\A\Downloads" -Filter "*.nupkg" |
%{ $_.Name }
# Select-String -Pattern "(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)" |
# %{ #($_.Matches[0].Groups["packageId"].Value, $_.Matches[0].Groups["version"].Value) }
foreach ($package in $packages){
$match = [System.Text.RegularExpressions.Regex]::Match($package, "(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)")
Write-Host "$($match.Groups["packageId"].Value) - $($match.Groups["version"].Value)"
}
Originally i tried to do this with powershell only and thought that with #(1,2,3) you could create an array.
I ended up bypassing the issue by doing the regex with c# instead of powershell, which works, but i am curious how this would have been done with powershell only.
While there are 4 packages, doing just the powershell version produced 8 lines. So accessing my data like $packages[0][0] to get a package id never worked because the 8 lines were strings while i expected 4 arrays to be returned
Terminology note re without using c#: You mean without direct use of .NET APIs. By contrast, C# is just another .NET-based language that can make use of such APIs, just like PowerShell itself.
Note:
The next section answers the following question: How can I avoid direct calls to .NET APIs for my regex-matching code in favor of using PowerShell-native commands (operators, automatic variables)?
See the bottom section for the Select-String solution that was your true objective; the tl;dr is:
# Note the `, `, which ensures that the array is output *as a single object*
%{ , #($_.Matches[0].Groups["packageId"].Value, $_.Matches[0].Groups["version"].Value) }
The PowerShell-native (near-)equivalent of your code is (note tha the assumption is that $package contains the content of the input file):
# Caveat: -match is case-INSENSITIVE; use -cmatch for case-sensitive matching.
if ($package -match '(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)') {
"$($Matches['packageId']) - $($Matches['Version'])"
}
-match, the regular-expression matching operator, is the equivalent of [System.Text.RegularExpressions.Regex]::Match() (which you can shorten to [regex]::Match()) in that it only looks for (at most) one match.
Caveat re case-sensitivity: -match (and its rarely used alias -imatch) is case-insensitive by default, as all PowerShell operators are; for case-sensitive matching, use the c-prefixed variant, -cmatch.
By contrast, .NET APIs are case-sensitive by default; you'd have to pass the [System.Text.RegularExpressions.RegexOptions]::IgnoreCase flag to [regex]::Match() for case-insensitive matching (you may use 'IgnoreCase', which PowerShell auto-converts for you).
As of PowerShell 7.2.x, there is no operator that is the equivalent of the related return-ALL-matches .NET API, [regex]::Matches(). See GitHub issue #7867 for a green-lit but yet-to-be-implemented proposal to introduce one, named -matchall.
However, instead of directly returning an object describing what was (or wasn't) matched, -match returns a Boolean, i.e. $true or $false, to indicate whether matching succeeded.
Only if -match returns $true does information about a match become available, namely via the automatic $Matches variable, which is a hashtable reflecting the matching parts of the input string: entry 0 is always the full match, with optional additional entries reflecting what any capture groups ((...)) captured, either by index, if they're anonymous (starting with 1) or, as in your case, for named capture groups ((?<name>...)) by name.
Syntax note: Given that PowerShell allows use of dot notation (property-access syntax) even with hashtables, the above command could have used $Matches.packageId instead of $Matches['packageId'], for instance, which also works with the numeric (index-based) entries, e.g., $Matches.0 instead of $Matches[0]
Caveat: If an array (enumerable) is used as the LHS operand, -match' behavior changes:
$Matches is not populated.
filtering is performed; that is, instead of returning a Boolean indicating whether matching succeeded, the subarray of matching input strings is returned.
Note that the $Matches hashtable only provides the matched strings, not also metadata such as index and length, as found in [regex]::Match()'s return object, which is of type [System.Text.RegularExpressions.Match].
Select-String solution:
$packages |
Select-String '(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)' |
ForEach-Object {
"$($_.Matches[0].Groups['packageId'].Value) - $($_.Matches[0].Groups['version'].Value)"
}
Select-String outputs Microsoft.PowerShell.Commands.MatchInfo instances, whose .Matches collection contains one or more [System.Text.RegularExpressions.Match] instances, i.e. instances of the same type as returned by [regex]::Match()
Unless -AllMatches is also passed, .Matches only ever has one entry, hence the use of [0] to target that entry above.
As you can see, working with Select-Object's output objects requires you to ultimately work with the same .NET type as when you call [regex]::Match() directly.
However, no method calls are required, and discovering the properties of the output objects is made easy in PowerShell via the Get-Member cmdlet.
If you want to capture the matches in a jagged array:
$capturedStrings = #(
$packages |
Select-String '(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)' |
ForEach-Object {
# Output an array of all capture-group matches,
# *as a single object* (note the `, `)
, $_.Matches[0].Groups.Where({ $_.Name -ne '0' }).Value
}
)
This returns an array of arrays, each element of which is the array of capture-group matches for a given package, so that $capturedStrings[0][0] returns the packageId value for the first package, for instance.
Note:
$_.Matches[0].Groups.Where({ $_.Name -ne '0' }).Value programmatically enumerates all capture-group matches and returns an their .Value property values as an array, using member-access enumeration; note how name '0' must be excluded, as it represents the whole match.
With the capture groups in your specific regex, the above is equivalent to the following, as shown in a commented-out line in your question:
#($_.Matches[0].Groups['packageId'].Value, $_.Matches[0].Groups['version'].Value)
, ..., the unary form of the array-construction operator, is used as a shortcut for outputting the array (symbolized by ... here) as a whole, as a single object. By default, enumeration would occur and the elements would be emitted one by one. , ... is in effect a shortcut to the conceptually clearer Write-Output -NoEnumerate ... - see this answer for an explanation of the technique.
Additionally, #(...), the array subexpression operator is needed in order to ensure that a jagged array (nested array) is returned even in the event that only one array is returned across all $packages.

Editing Powershell Object

I'm using powershell to run a command like so:
$getlist=rclone sha1sum remote:"\My Pictures\2009\03" --dry-run
Write-Output $getlist
that outputs a object with the results. Problem being I only want the first column of those results. I've tried things like custom-format --Depth 1 and the other *-format commands but they don't work on this object??
that outputs a object with the results
While that is technically true, it is more specifically an [object[]]-typed array of lines ([string] instances) that assigning the stream of output lines - produced by the external rclone program - to a PowerShell variable implicitly created. (Arrays created by PowerShell are [object[]]-typed, even if all the elements are of the same type, such as [string] in this case).
PowerShell fundamentally only "speaks text" when communicating with external programs.
Therefore, to extract substrings from these lines you must perform text parsing, as implied by AdminOfThings' comment on the question.
A simplified approach is to use the unary form of the -split operator:
# Simulate lines input whose first whitespace-separated token is to
# be extracted.
$getlist = 'foo bar baz', 'more stuff here'
$getlist.ForEach({ (-split $_)[0] })
The above yields:
foo
more
zett42's helpful answer shows a simpler alternative that relies on the -replace operator's (among others) ability to operate directly on each element of an array-valued LHS.
However, the -split approach is useful if you want to extract multiple column values.
If you don't need / want to capture all of the external program's (rclone's) output in memory first, you can use streaming processing in the pipeline, via the ForEach-Object cmdlet:
'foo bar baz', 'more stuff here' | ForEach-Object { (-split $_)[0] }
Note: While slightly slower than collecting all lines in memory up front, the advantage of a pipeline-based approach is reduced memory load: only the extracted substrings are kept in memory (if assigned to a variable).
You can use a regular expression to remove the undesired parts of the output:
$getlist = $getlist -replace '\s.*'
When a PowerShell operator such as -replace is applied to a collection, it will be applied to each element individually, creating a new array that stores the results (see Substitution in a collection).
The regular expression removes everything from the first whitespace up to the end of the string.
RegEx breakdown:
\s - a single whitespace character like space and tab
.* - any character, zero or more times

Having problem with split method using powershell

I have an xml file where i have line some
<!--<__AMAZONSITE id="-123456780" instance ="CATZ00124"__/>-->
and i need the id and instance values from that particular line.
where i need have -123456780 as well as CATZ00124 in 2 different variables.
Below is the sample code which i have tried
$xmlfile = 'D:\Test\sample.xml'
$find_string = '__AMAZONSITE'
$array = #((Get-Content $xmlfile) | select-string $find_string)
Write-Host $array.Length
foreach ($commentedline in $array)
{
Write-Host $commentedline.Line.Split('id=')
}
I am getting below result:
<!--<__AMAZONSITE
"-123456780"
nstance
"CATZ00124"__/>
The preferred way still is to use XML tools for XML files.
As long a line with AMAZONSITE and instance is unique in the file this could do:
## Q:\Test\2019\09\13\SO_57923292.ps1
$xmlfile = 'D:\Test\sample.xml' # '.\sample.xml' #
## see following RegEx live and with explanation on https://regex101.com/r/w34ieh/1
$RE = '(?<=AMAZONSITE id=")(?<id>[\d-]+)" instance ="(?<instance>[^"]+)"'
if((Get-Content $xmlfile -raw) -match $RE){
$AmazonSiteID = $Matches.id
$Instance = $Matches.instance
}
LotPings' answer sensibly recommends using a regular expression with capture groups to extract the substrings of interest from each matching line.
You can incorporate that into your Select-String call for a single-pipeline solution (the assumption is that the XML comments of interest are all on a single line each):
# Define the regex to use with Select-String, which both
# matches the lines of interest and captures the substrings of interest
# ('id' an 'instance' attributes) via capture groups, (...)
$regex = '<!--<__AMAZONSITE id="(.+?)" instance ="(.+?)"__/>-->'
Select-String -LiteralPath $xmlfile -Pattern $regex | ForEach-Object {
# Output a custom object with properties reflecting
# the substrings of interest reported by the capture groups.
[pscustomobject] #{
id = $_.Matches.Groups[1].Value
instance = $_.Matches.Groups[2].Value
}
}
The result is an array of custom objects that each have an .id and .instance property with the values of interest (which is preferable to setting individual variables); in the console, the output would look something like this:
id instance
-- --------
-123456780 CATZ00124
-123456781 CATZ00125
-123456782 CATZ00126
As for what you tried:
Note: I'm discussing your use of .Split(), though for extracting a substring, as is your intent, .Split() is not the best tool, given that it is only the first step toward isolating the substring of interest.
As LotPings notes in a comment, in Windows PowerShell, $commentedline.Line.Split('id=') causes the String.Split() method to split the input string by any of the individual characters in split string 'id=', because the method overload that Windows PowerShell selects takes a char[] value, i.e. an array of characters, which is not your intent.
You could rectify this as follows, by forcing use of the overload that accepts string[] (even though you're only passing one string), which also requires passing an options argument:
$commentedline.Line.Split([string[] 'id=', 'None') # OK, splits by whole string
Note that in PowerShell Core the logic is reversed, because .NET Core introduced a new overload with just [string] (with an optional options argument), which PowerShell Core selects by default. Conversely, this means that if you do want by-any-character splitting in PowerShell Core, you must cast the split string to [char[]].
On a general note, PowerShell has the -split operator, which is regex-based and offers much more flexibility than String.Split() - see this answer.
Applied to your case:
$commentedline.Line -split 'id='
While id= is interpreted a regex by -split, that makes no difference here, given that the string contains no regex metacharacters (characters with special meaning); if you do want to safely split by a literal substring, use [regex]::Escape('...') as the RHS.
Note that -split is case-insensitive by default, as PowerShell generally is; however, you can use the -csplit variant for case-sensitive matching.

troubles with powershell where clause

i am trying to parse a website for specific data.
$strings = $body.split(";")
$strings2 = $strings.Where({$_ -like ("*recordData[`"key`"]*")})
i figured out that the square brackets are to blame, i use
$strings2 = $strings.Where({$_ -like ("*recordData*")})
and it works fine, albeit returning way more results then i need.
is there a way i can just search for "recordData[key]"
$body is just the entirety of a returned webpage
thanks.
EDIT:
as requested the input data is like this
rs.currentColumn = 0;
recordData["dataGridExtraRow"] = 0;
recordData["rownum"] = "0";
recordData["key"] = '3354087';
recordData["factory"] = "cr";
in the end i need the 3354087 number, but just picking out the lines i needed was the issue, after that i can pick apart the string fine.
however, i ended up using the .contains, thanks for the suggestion.
sort of facepalmed after i saw it though.
If you are using the like operator then you are going to be having an issue with the open square bracket which is a wildcard character in PowerShell. See About_Wildcards
...
[] - Matches range of characters
...
A use case would be something like this which would return true.
"recordFata" -like "record[DF]ata"
So if you are going to be using -like you need to escape the brackets using backticks in a single quoted sting. You can avoid that by using other methods that function in the same way you intended.
Other options
String .Contains() Method
"sdfafdrecordData[key]asdfasdfas".Contains("recordData[key]")
Fairly basic and no need to worry about special characters for the most part.
Regex
"sdfafdrecordData[key]asdfasdfas" -match "recordData\[key]"
Note that the square braces are also regex metacharacters that need to be escaped as well.
Try using single quotes and escaping with backticks.
You can also simplify using PowerShell syntax:
$strings = $body -split ';' -like '*recordData`[key`]*'

Replacing $_ substring value in powershell

I am trying to make further use of a wonderful piece of code I found when I tried to replace text at a specified line.
However trying to get it to read $_.Substring() and then using $_ -replace is giving me troubles; although I get no error messages the text does not get replaced.
Here is code that does not work:
$content = Get-Content Z:\project\folder\subfolder\newindex2.html
$content |
ForEach-Object {
if ($_.ReadCount -ge 169 -and $_.ReadCount -le 171) {
$a = $_.Substring(40,57)
$linked = '' + $a + ''
$_ -replace $a,$linked
} else {
$_
}
} |
Set-Content Z:\project\folder\subfolder\newindex2.html
The whole point is to make the content of a cell in a html table column link to a file on a webserver with the same name as what's in the cell.
I didn't have any luck trying my hand at regex trying to match the filenames, but since I managed to make it so the text that's to be replaced always ends up at the same position, I figured I'd try positional replacement instead.
The text that is to be replaced is always 57 characters long and always starts at position 40.
I looked at the variables getting set, and everything gets set correctly, except that
$_ -replace $a,$linked
does not replace anything.
Instead, the whole file just gets written anew with nothing changed. Can anyone please point to what I am missing and/or point to how to reach the result more easily? Maybe I'm using Substring wrong and should be using something else?
The first item in the right-hand argument of -replace is a regex pattern, so depending on what the substring contains, some of the characters might be regex control characters.
You can either escape it:
$_ -replace $([regex]::Escape($a)),$linked
Or use the String.Replace() method, which does not use regex:
$_.Replace($a,$linked)
Finally, as #Matt points out, you might want to avoid the find-and-replace approach altogether, since you already know at which character indices you need to insert your new value:
$_.Remove(40,57).Insert(40,$linked)