I'm using powershell to run a command like so:
$getlist=rclone sha1sum remote:"\My Pictures\2009\03" --dry-run
Write-Output $getlist
that outputs a object with the results. Problem being I only want the first column of those results. I've tried things like custom-format --Depth 1 and the other *-format commands but they don't work on this object??
that outputs a object with the results
While that is technically true, it is more specifically an [object[]]-typed array of lines ([string] instances) that assigning the stream of output lines - produced by the external rclone program - to a PowerShell variable implicitly created. (Arrays created by PowerShell are [object[]]-typed, even if all the elements are of the same type, such as [string] in this case).
PowerShell fundamentally only "speaks text" when communicating with external programs.
Therefore, to extract substrings from these lines you must perform text parsing, as implied by AdminOfThings' comment on the question.
A simplified approach is to use the unary form of the -split operator:
# Simulate lines input whose first whitespace-separated token is to
# be extracted.
$getlist = 'foo bar baz', 'more stuff here'
$getlist.ForEach({ (-split $_)[0] })
The above yields:
foo
more
zett42's helpful answer shows a simpler alternative that relies on the -replace operator's (among others) ability to operate directly on each element of an array-valued LHS.
However, the -split approach is useful if you want to extract multiple column values.
If you don't need / want to capture all of the external program's (rclone's) output in memory first, you can use streaming processing in the pipeline, via the ForEach-Object cmdlet:
'foo bar baz', 'more stuff here' | ForEach-Object { (-split $_)[0] }
Note: While slightly slower than collecting all lines in memory up front, the advantage of a pipeline-based approach is reduced memory load: only the extracted substrings are kept in memory (if assigned to a variable).
You can use a regular expression to remove the undesired parts of the output:
$getlist = $getlist -replace '\s.*'
When a PowerShell operator such as -replace is applied to a collection, it will be applied to each element individually, creating a new array that stores the results (see Substitution in a collection).
The regular expression removes everything from the first whitespace up to the end of the string.
RegEx breakdown:
\s - a single whitespace character like space and tab
.* - any character, zero or more times
Related
After playing around with some powershell script for a while i was wondering if there is a version of this without using c#. It feels like i am missing some information on how to pipe things properly.
$packages = Get-ChildItem "C:\Users\A\Downloads" -Filter "*.nupkg" |
%{ $_.Name }
# Select-String -Pattern "(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)" |
# %{ #($_.Matches[0].Groups["packageId"].Value, $_.Matches[0].Groups["version"].Value) }
foreach ($package in $packages){
$match = [System.Text.RegularExpressions.Regex]::Match($package, "(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)")
Write-Host "$($match.Groups["packageId"].Value) - $($match.Groups["version"].Value)"
}
Originally i tried to do this with powershell only and thought that with #(1,2,3) you could create an array.
I ended up bypassing the issue by doing the regex with c# instead of powershell, which works, but i am curious how this would have been done with powershell only.
While there are 4 packages, doing just the powershell version produced 8 lines. So accessing my data like $packages[0][0] to get a package id never worked because the 8 lines were strings while i expected 4 arrays to be returned
Terminology note re without using c#: You mean without direct use of .NET APIs. By contrast, C# is just another .NET-based language that can make use of such APIs, just like PowerShell itself.
Note:
The next section answers the following question: How can I avoid direct calls to .NET APIs for my regex-matching code in favor of using PowerShell-native commands (operators, automatic variables)?
See the bottom section for the Select-String solution that was your true objective; the tl;dr is:
# Note the `, `, which ensures that the array is output *as a single object*
%{ , #($_.Matches[0].Groups["packageId"].Value, $_.Matches[0].Groups["version"].Value) }
The PowerShell-native (near-)equivalent of your code is (note tha the assumption is that $package contains the content of the input file):
# Caveat: -match is case-INSENSITIVE; use -cmatch for case-sensitive matching.
if ($package -match '(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)') {
"$($Matches['packageId']) - $($Matches['Version'])"
}
-match, the regular-expression matching operator, is the equivalent of [System.Text.RegularExpressions.Regex]::Match() (which you can shorten to [regex]::Match()) in that it only looks for (at most) one match.
Caveat re case-sensitivity: -match (and its rarely used alias -imatch) is case-insensitive by default, as all PowerShell operators are; for case-sensitive matching, use the c-prefixed variant, -cmatch.
By contrast, .NET APIs are case-sensitive by default; you'd have to pass the [System.Text.RegularExpressions.RegexOptions]::IgnoreCase flag to [regex]::Match() for case-insensitive matching (you may use 'IgnoreCase', which PowerShell auto-converts for you).
As of PowerShell 7.2.x, there is no operator that is the equivalent of the related return-ALL-matches .NET API, [regex]::Matches(). See GitHub issue #7867 for a green-lit but yet-to-be-implemented proposal to introduce one, named -matchall.
However, instead of directly returning an object describing what was (or wasn't) matched, -match returns a Boolean, i.e. $true or $false, to indicate whether matching succeeded.
Only if -match returns $true does information about a match become available, namely via the automatic $Matches variable, which is a hashtable reflecting the matching parts of the input string: entry 0 is always the full match, with optional additional entries reflecting what any capture groups ((...)) captured, either by index, if they're anonymous (starting with 1) or, as in your case, for named capture groups ((?<name>...)) by name.
Syntax note: Given that PowerShell allows use of dot notation (property-access syntax) even with hashtables, the above command could have used $Matches.packageId instead of $Matches['packageId'], for instance, which also works with the numeric (index-based) entries, e.g., $Matches.0 instead of $Matches[0]
Caveat: If an array (enumerable) is used as the LHS operand, -match' behavior changes:
$Matches is not populated.
filtering is performed; that is, instead of returning a Boolean indicating whether matching succeeded, the subarray of matching input strings is returned.
Note that the $Matches hashtable only provides the matched strings, not also metadata such as index and length, as found in [regex]::Match()'s return object, which is of type [System.Text.RegularExpressions.Match].
Select-String solution:
$packages |
Select-String '(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)' |
ForEach-Object {
"$($_.Matches[0].Groups['packageId'].Value) - $($_.Matches[0].Groups['version'].Value)"
}
Select-String outputs Microsoft.PowerShell.Commands.MatchInfo instances, whose .Matches collection contains one or more [System.Text.RegularExpressions.Match] instances, i.e. instances of the same type as returned by [regex]::Match()
Unless -AllMatches is also passed, .Matches only ever has one entry, hence the use of [0] to target that entry above.
As you can see, working with Select-Object's output objects requires you to ultimately work with the same .NET type as when you call [regex]::Match() directly.
However, no method calls are required, and discovering the properties of the output objects is made easy in PowerShell via the Get-Member cmdlet.
If you want to capture the matches in a jagged array:
$capturedStrings = #(
$packages |
Select-String '(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)' |
ForEach-Object {
# Output an array of all capture-group matches,
# *as a single object* (note the `, `)
, $_.Matches[0].Groups.Where({ $_.Name -ne '0' }).Value
}
)
This returns an array of arrays, each element of which is the array of capture-group matches for a given package, so that $capturedStrings[0][0] returns the packageId value for the first package, for instance.
Note:
$_.Matches[0].Groups.Where({ $_.Name -ne '0' }).Value programmatically enumerates all capture-group matches and returns an their .Value property values as an array, using member-access enumeration; note how name '0' must be excluded, as it represents the whole match.
With the capture groups in your specific regex, the above is equivalent to the following, as shown in a commented-out line in your question:
#($_.Matches[0].Groups['packageId'].Value, $_.Matches[0].Groups['version'].Value)
, ..., the unary form of the array-construction operator, is used as a shortcut for outputting the array (symbolized by ... here) as a whole, as a single object. By default, enumeration would occur and the elements would be emitted one by one. , ... is in effect a shortcut to the conceptually clearer Write-Output -NoEnumerate ... - see this answer for an explanation of the technique.
Additionally, #(...), the array subexpression operator is needed in order to ensure that a jagged array (nested array) is returned even in the event that only one array is returned across all $packages.
I have an xml file where i have line some
<!--<__AMAZONSITE id="-123456780" instance ="CATZ00124"__/>-->
and i need the id and instance values from that particular line.
where i need have -123456780 as well as CATZ00124 in 2 different variables.
Below is the sample code which i have tried
$xmlfile = 'D:\Test\sample.xml'
$find_string = '__AMAZONSITE'
$array = #((Get-Content $xmlfile) | select-string $find_string)
Write-Host $array.Length
foreach ($commentedline in $array)
{
Write-Host $commentedline.Line.Split('id=')
}
I am getting below result:
<!--<__AMAZONSITE
"-123456780"
nstance
"CATZ00124"__/>
The preferred way still is to use XML tools for XML files.
As long a line with AMAZONSITE and instance is unique in the file this could do:
## Q:\Test\2019\09\13\SO_57923292.ps1
$xmlfile = 'D:\Test\sample.xml' # '.\sample.xml' #
## see following RegEx live and with explanation on https://regex101.com/r/w34ieh/1
$RE = '(?<=AMAZONSITE id=")(?<id>[\d-]+)" instance ="(?<instance>[^"]+)"'
if((Get-Content $xmlfile -raw) -match $RE){
$AmazonSiteID = $Matches.id
$Instance = $Matches.instance
}
LotPings' answer sensibly recommends using a regular expression with capture groups to extract the substrings of interest from each matching line.
You can incorporate that into your Select-String call for a single-pipeline solution (the assumption is that the XML comments of interest are all on a single line each):
# Define the regex to use with Select-String, which both
# matches the lines of interest and captures the substrings of interest
# ('id' an 'instance' attributes) via capture groups, (...)
$regex = '<!--<__AMAZONSITE id="(.+?)" instance ="(.+?)"__/>-->'
Select-String -LiteralPath $xmlfile -Pattern $regex | ForEach-Object {
# Output a custom object with properties reflecting
# the substrings of interest reported by the capture groups.
[pscustomobject] #{
id = $_.Matches.Groups[1].Value
instance = $_.Matches.Groups[2].Value
}
}
The result is an array of custom objects that each have an .id and .instance property with the values of interest (which is preferable to setting individual variables); in the console, the output would look something like this:
id instance
-- --------
-123456780 CATZ00124
-123456781 CATZ00125
-123456782 CATZ00126
As for what you tried:
Note: I'm discussing your use of .Split(), though for extracting a substring, as is your intent, .Split() is not the best tool, given that it is only the first step toward isolating the substring of interest.
As LotPings notes in a comment, in Windows PowerShell, $commentedline.Line.Split('id=') causes the String.Split() method to split the input string by any of the individual characters in split string 'id=', because the method overload that Windows PowerShell selects takes a char[] value, i.e. an array of characters, which is not your intent.
You could rectify this as follows, by forcing use of the overload that accepts string[] (even though you're only passing one string), which also requires passing an options argument:
$commentedline.Line.Split([string[] 'id=', 'None') # OK, splits by whole string
Note that in PowerShell Core the logic is reversed, because .NET Core introduced a new overload with just [string] (with an optional options argument), which PowerShell Core selects by default. Conversely, this means that if you do want by-any-character splitting in PowerShell Core, you must cast the split string to [char[]].
On a general note, PowerShell has the -split operator, which is regex-based and offers much more flexibility than String.Split() - see this answer.
Applied to your case:
$commentedline.Line -split 'id='
While id= is interpreted a regex by -split, that makes no difference here, given that the string contains no regex metacharacters (characters with special meaning); if you do want to safely split by a literal substring, use [regex]::Escape('...') as the RHS.
Note that -split is case-insensitive by default, as PowerShell generally is; however, you can use the -csplit variant for case-sensitive matching.
What does this mean: $_ and % in Powershell?
1..10 | Foreach {if($_%2){"$_ is odd number"}}
%
In your case, it is the modulus operator. It will return the remainder of dividing the left-hand side value by the right-hand side value.
It defaults as a PowerShell alias for Foreach-Object. You can execute the Get-Alias command to see other potential aliases that may contain special characters like Where-Object's alias ?.
$_
Synonymous with $PSItem
Contains the current object in the pipeline object
In your case, it represents the current object passed into your Foreach-Object script block ({}).
It will commonly show up in the Where-Object {} script block and Select-Object hash tables.
#
A literal # character
Denotes splatting
The syntax is #VariableName. The variable can be an array or hash table. It is commonly used with a hash table or dictionary where the Name property represents a parameter name and the value property is the value for that parameter. Then that variable is splatted into another command. An example is Get-Process #Params.
Used for declaring and initializing arrays via the array sub-expression operator #().
Examples are $myArray = #() and $myArray = #("value1","value2").
Used to create and/or initialize a hash table
The syntax is $variable = #{} or $variable = #{Property=Value}.
Used in here-strings
Here-strings are special case strings that can expand multiple lines and contain special characters
Denoted by beginning a string value with #' or #" and closing the string value with a corresponding '# or "#.
The here-string open and close characters should be isolated on their respective lines of the right-hand side (RHS).
Common At symbol
Used in email address construction, i.e. user#domain.com.
Used in external program remote logon syntax, i.e. user#hostname.
Extra Reading and Notable Links:
See About Arithmetic Operators for information on modulus among other arithmetic operators.
See Foreach-Object for more information about Foreach-Object and how objects are processed.
See About Splatting for more information and usage of splatting.
Another good resource is About Automatic Variables, which will list PowerShell's reserved/automatic variables. They are created and maintained by PowerShell. You will notice there are some variables that have non-alpha and non-numeric characters. You should only use these variables for their intended purposes and not use their names when you create your own custom variables.
See About Arrays for details on the array sub-expression operator.
See About Hash Tables for details on creating and manipulating hash table objects.
See About Quoting Rules to see more information and examples of using here-strings.
This would likely be a non-issue with expert regex comprehension. And only matters because I am running multiple chained replace commands that affect some of the same text in a text file. I also imagine partitioning the txt files based on how delimiter words --that are requiring multiple replaces-- are used, before replace, would help. With that said basic structural knowledge of powershell is useful and I have not found many great resources (open to suggestions!).
The question: Do chained powershell replace commands execute one after the other?
-replace "hello:","hello " `
-replace "hello ","hello:"
} | out-file ...
Would this silly example above yield hello:'s where there were initially hello:'s?
From working through some projects I gather that the above works most of the time. Yet there always seem to be some edge cases. Is this another aspect of the script or is the order that chained commands (decent number of them) execute in never variable?
What you have there are operators, not commands.
I say that not to be pedantic, but because "command" has a specific meaning in PowerShell (it is a general name encompassing functions, cmdlets, aliases, applications, filters, configurations (this is a DSC construct), workflows, and scripts), and because the way they can be used together is different.
Most operators are reserved words that begin with - (but other things count as operators, like casting), and you can indeed use them chained together. They also execute in order.
I need to clarify; they don't necessarily execute in the order given when you mix operators. Multiple of the same operator will because they all have the same precedence, but you should check about_Operator_Precedence to see the order that will be used when you combine them.
Note that some operators can "short-circuit" (which may sound like a malfunction, but it isn't), that is the result of certain boolean operators will not evaluate later operations if the boolean result can not change.
For example:
$true -or $false
In this example, the $false part of the expression will never actually be evaluated. This is important if the next part of the expression is complex or even invalid. Consider these:
$true -or $(throw)
$false -or $(throw)
The first will return $true because (presumably) nothing in the coming expression could make it $false.
The second line must evaluate the second expression, and in doing so it throws an exception, halting the program.
So, aside from that aside, yes, you can continue to chain your operators. You also don't need a line continuation character (backtick `) at the end of the line if the operator itself is at the end. More useful with boolean operators:
$a -and
$b -or
$c -xor
$false
A little awkward with something like replace:
'apple' -replace
'p',
'z'
Regarding this:
And only matters because I am running multiple chained replace
commands that affect some of the same text in a text file.
These operators aren't touching anything in a file, they are working with data in memory, as literals or variables in your script (what you do with it then, like writing to a file is your business).
Further, even then it doesn't change any values already in variables, it returns new ones, which you may assign to a variable or use in any other way.
$var = 'apple'
$var -replace 'p','Z'
$var
The value of the replacement will be returned, but nothing was done with it so it went out to the console. Then you can see that $var was not modified at all, as opposed to:
$var = 'apple'
$var = $var -replace 'p','Z'
$var
Where the value of $var was overwritten.
If there are edge cases, it's likely to be a misunderstanding of something in the sequence of events (an incorrect regular expression, not assigning or using a value, incorrect logic, etc.), as the order of operations will be consistent. If you have any such edge cases, please post them!
I am attempting to isolate and return a small variable string from a larger string.
I am struggling because the larger string I am extracting from is in list format. I can split this into substrings successfully, but I do not know how to select one of these substrings without returning the entire string. The string is generated by a command line process.
$StringList
AppTitle1.1.1221.aaa111
AppSubTitle
AnotherAppTitle1.1.1221.aaa111
AnotherAppSubTitle
...and so on
I can split the list string into substrings by line using regular expressions to split at whitespace (there is no whitespace within any given line).
$StringList -split "\s"
Once I have split the string into the desired substrings, however, I am not sure how to select the desired substring. The length of the list (i.e. the number of apps present in it) and the location of the app I need to retrieve the title of within that list are entirely variable, so I cannot simply use substring reference numbers. I've tried several approaches to selecting the substring, but each has simply returned the entire string, or nothing at all.
Here are two approaches I've attempted. The first returns the entire string list and the second returns nothing.
$DesiredAppTitle = Select-String -InputObject $StringList -Pattern "AnotherAppTitle"
or
$DesiredAppTitle = foreach ($_.substring in $StringList)
{
if ($_.substring -contains "AnotherAppTitle")
{
return $_.name
}
}
What I'd like for it to return is:
AnotherAppTitle1.1.1221.aaa111
I'm sure there are a million ways to do this, so if neither of my approaches seems like a good fit, I'm open to other suggestions. Any assistance would be greatly appreciated. Thanks in advance!
# Multi-line input string.
$StringList = #'
AppTitle1.1.1221.aaa111
AppSubTitle
AnotherAppTitle1.1.1221.aaa111
AnotherAppSubTitle
'#
# Split it into whitespace-separated tokens.
$tokens = -split $StringList
# Match the token of interest.
$tokens -match '^AnotherAppTitle'
The above yields:
AnotherAppTitle1.1.1221.aaa111
Note the use of regex-matching operator with anchor ^ to ensure that the search term matches at the start of a token, and the use of the unary form of the -split operator, which splits the input by any nonempty whitespace runs.
As for what you tried:
If you pass a multi-line string to Select-String, it is considered a single "line" and, in case of a match, that whole "line" is output.
foreach ($_.substring in $StringList) won't even run, because $_.substring is not a valid iteration variable (you shouldn't use $_, which is an automatic variable, as an enumeration variable at all, and the .substring access breaks the syntax).
If you used $_ instead of $_.substring, the loop would technically work (even though, again, $_ shouldn't be used as an iteration variable), but the loop would only execute once, for the entire multi-line string.
Even if $_.substring did refer to a line (it doesn't), -contains is the wrong operator to use, because it tests if a LHS collection contains the RHS value in full.
Also, use break to exit a loop, not return.
Using the -match approach as demonstrated at the top is the better approach, but if you did want to solve this with a foreach loop:
$DesiredAppTitle = foreach ($token in -split $StringList) {
if ($token -match '^AnotherAppTitle') { $token; break }
}