What constitutes a "line" for Select-String method in Powershell? - powershell

I would expect that Select-String consider \r\n (carriage-return + newline) the end of a line in Powershell.
However, as can be seen below, abc matches the whole the whole input:
PS C:\Tools\hashcat> "abc`r`ndef" | Select-String -Pattern "abc"
abc
def
If I break the string up into two parts, then Select-String behaves as I would expect:
PS C:\Tools\hashcat> "abc", "def" | Select-String -Pattern "abc"
abc
How can I give Select-String a string whose lines are terminated by \r\n, and then make this cmdlet only returns those strings that contain a match?

Select-String operates on each (stringified on demand[1]) input object.
A multi-line string such as "abc`r`ndef" is a single input object.
By contrast, "abc", "def" is a string array with two elements, passed as two input objects.
To ensure that the lines of a multi-line string are passed individually, split the string into an array of lines using PowerShell's -split operator: "abc`r`ndef" -split "`r?`n"
(The ? makes the `r optional so as to also correctly deal with `n-only (LF-only, Unix-style) line endings.)
In short:
"abc`r`ndef" -split "`r?`n" | Select-String -Pattern "abc"
The equivalent, using a PowerShell string literal with regular-expression (regex) escape sequences (the RHS of -split is a regex):
"abc`r`ndef" -split '\r?\n' | Select-String -Pattern "abc"
It is somewhat unfortunate that the Select-String documentation talks about operating on lines of text, given that the real units of operations are input objects - which may themselves comprise multiple lines, as we've seen.
Presumably, this comes from the typical use case of providing input objects via the Get-Content cmdlet, which outputs a text file's lines one by one.
Note that Select-String doesn't return the matching strings directly, but wraps them in [Microsoft.PowerShell.Commands.MatchInfo] objects containing helpful metadata about the match.
Even there the line metaphor is present, however, as it is the .Line property that contains the matching string.
[1] Optional reading: How Select-String stringifies input objects
If an input object isn't a string already, it is converted to one, though possibly not in the way you might expect:
Loosely speaking, the .ToString() method is called on each non-string input object[2]
, which for non-strings is not the same as the representation you get with PowerShell's default output formatting (the latter is what you see when you print an object to the console or use Out-File, for instance); by contrast, it is the same representation you get with string interpolation in a double-quoted string (when you embed a variable reference or command in "...", e.g., "$HOME" or "$(Get-Date)").
Often, .ToString() just yields the name of the object's type, without containing any instance-specific information; e.g., $PSVersionTable stringifies to System.Management.Automation.PSVersionHashTable.
# Matches NOTHING, because Select-String sees
# 'System.Management.Automation.PSVersionHashTable' as its input.
$PSVersionTable | Select-String PSVersion
In case you do want to search the default output format line by line, use the following idiom:
... | Out-String -Stream | Select-String ...
However, note that for non-string input it is more robust and preferable for subsequent processing to filter the input by querying properties with a Where-Object condition.
That said, there is a strong case to be made for Select-String needing to implicitly apply Out-String -Stream stringification, as discussed in this GitHub feature request.
[2] More accurately, .psobject.ToString() is called, either as-is, or - if the object's ToString method supports an IFormatProvider-typed argument - as .psobject.ToString([cultureinfo]::InvariantCulture) so as to obtain a culture-invariant representation - see this answer for more information.

"abc`r`ndef"
is one string which if you echo (Write-Output) out in console would result in:
PS C:\Users\gpunktschmitz> echo "abc`r`ndef"
abc
def
The Select-String will echo out every string where "abc" is part of it. As "abc" is part the string this very string will be selected.
"abc", "def"
is a list of two strings. Using the Select-String here will first test "abc" and then "def" if the pattern matches "abc". As only the first one matches only it will be selected.
Use the following to split the string into a list and select only the elements containing "abc"
"abc`r`ndef".Split("`r`n") | Select-String -Pattern "abc"

Basically Mr. Guenther Schmitz explained the correct usage of Select-String, but I want to just add some points to support his answer.
I did some reverse engineering work against this Select-String cmdlet. It's in the Microsoft.PowerShell.Utility.dll. Some relevant code snippets are as follows, notice these are codes from reverse engineering for reference, not the actual source code.
string text = inputObject.BaseObject as string;
...
matchInfo = (inputObject.BaseObject as MatchInfo);
object operand = ((object)matchInfo) ?? ((object)inputObject);
flag2 = doMatch(operand, out matchInfo2, out text);
We can find out that it just treat the inputObject as a whole string, it doesn't do any split.
I don't find the actual source code of this cmdlet on github, probably this utility part is not open source yet. But I find the unit test of this Select-String.
$testinputone = "hello","Hello","goodbye"
$testinputtwo = "hello","Hello"
The test strings they are using for unit test are actually lists of strings. It means that they were not even thinking about your use case and very possibly it's just designed to accept input of string collection.
However if we look at the official document of Microsoft regarding Select-String we do see it talks about line a lot while it can't recognize a line in a string. My personal guess is the concept of line is only meaningful while the cmdlet accept a file as an input, in the case the file is like a list of string, each item in the list represents a single line.
Hope it can make things more clear.

Related

attempting to convert an output into a numeric value

Trying to setup a powershell monitor for maintenance mode value from the output of ./repcli status:
This returns a long list of values and I'm trying to return the status for maintenance mode being [disabled] or [enabled]
The line of interest looks like this, for example:
Maintenance mode = [enabled]
I'd like to determine whether the line of interest contains [enabled] or [disabled], and return 1 in the former case, 0 in the latter.
What I tried:
./repcli.exe status | out-string | select-string -pattern 'maintenance'
This returns all output lines, which is not what I want.
Try the following:
# In the output from ./repcli.exe status, extract the line that contains
# substring 'maintenance' and see if it contains substring '[enabled]'
# Note: In PSv7+, you can replace '| ForEach Line' with '-Raw'
$enabled =
(./repcli.exe status | Select-String maintenance | ForEach Line) -match '\[enabled\]'
# Map the enabled status to 1 ($true) or 0 ($false)
return [int] $enabled
Note the use of the -match operator, which, due to being regex-based, requires escaping [ and ] with \ in order to be used literally. Select-String too uses regexes, except if you pass -SimpleMatch.
As for what you tried:
Out-String (without -Stream) returns a single string, comprising all the lines output by your ./repcli.exe call.
Therefore, if Select-String finds a match, it returns the entire string, not just the line on which the pattern is found.
You can avoid that problem by simply omitting the Out-String call, given that PowerShell automatically relays output lines from calls to external programs line by line.
While Select-String then only returns matching lines, note that it doesn't do so directly, but wraps them in Microsoft.PowerShell.Commands.MatchInfo instances that accompany the actual line text, in property .Line, with metadata about the match.
Pipe to | ForEach Line or, in PowerShell (Core) 7+ only, add the -Raw switch to get only the line text.
Of course, you then need to examine the line text returned for the substring of interest, as shown above.

How to pipe results into output array

After playing around with some powershell script for a while i was wondering if there is a version of this without using c#. It feels like i am missing some information on how to pipe things properly.
$packages = Get-ChildItem "C:\Users\A\Downloads" -Filter "*.nupkg" |
%{ $_.Name }
# Select-String -Pattern "(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)" |
# %{ #($_.Matches[0].Groups["packageId"].Value, $_.Matches[0].Groups["version"].Value) }
foreach ($package in $packages){
$match = [System.Text.RegularExpressions.Regex]::Match($package, "(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)")
Write-Host "$($match.Groups["packageId"].Value) - $($match.Groups["version"].Value)"
}
Originally i tried to do this with powershell only and thought that with #(1,2,3) you could create an array.
I ended up bypassing the issue by doing the regex with c# instead of powershell, which works, but i am curious how this would have been done with powershell only.
While there are 4 packages, doing just the powershell version produced 8 lines. So accessing my data like $packages[0][0] to get a package id never worked because the 8 lines were strings while i expected 4 arrays to be returned
Terminology note re without using c#: You mean without direct use of .NET APIs. By contrast, C# is just another .NET-based language that can make use of such APIs, just like PowerShell itself.
Note:
The next section answers the following question: How can I avoid direct calls to .NET APIs for my regex-matching code in favor of using PowerShell-native commands (operators, automatic variables)?
See the bottom section for the Select-String solution that was your true objective; the tl;dr is:
# Note the `, `, which ensures that the array is output *as a single object*
%{ , #($_.Matches[0].Groups["packageId"].Value, $_.Matches[0].Groups["version"].Value) }
The PowerShell-native (near-)equivalent of your code is (note tha the assumption is that $package contains the content of the input file):
# Caveat: -match is case-INSENSITIVE; use -cmatch for case-sensitive matching.
if ($package -match '(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)') {
"$($Matches['packageId']) - $($Matches['Version'])"
}
-match, the regular-expression matching operator, is the equivalent of [System.Text.RegularExpressions.Regex]::Match() (which you can shorten to [regex]::Match()) in that it only looks for (at most) one match.
Caveat re case-sensitivity: -match (and its rarely used alias -imatch) is case-insensitive by default, as all PowerShell operators are; for case-sensitive matching, use the c-prefixed variant, -cmatch.
By contrast, .NET APIs are case-sensitive by default; you'd have to pass the [System.Text.RegularExpressions.RegexOptions]::IgnoreCase flag to [regex]::Match() for case-insensitive matching (you may use 'IgnoreCase', which PowerShell auto-converts for you).
As of PowerShell 7.2.x, there is no operator that is the equivalent of the related return-ALL-matches .NET API, [regex]::Matches(). See GitHub issue #7867 for a green-lit but yet-to-be-implemented proposal to introduce one, named -matchall.
However, instead of directly returning an object describing what was (or wasn't) matched, -match returns a Boolean, i.e. $true or $false, to indicate whether matching succeeded.
Only if -match returns $true does information about a match become available, namely via the automatic $Matches variable, which is a hashtable reflecting the matching parts of the input string: entry 0 is always the full match, with optional additional entries reflecting what any capture groups ((...)) captured, either by index, if they're anonymous (starting with 1) or, as in your case, for named capture groups ((?<name>...)) by name.
Syntax note: Given that PowerShell allows use of dot notation (property-access syntax) even with hashtables, the above command could have used $Matches.packageId instead of $Matches['packageId'], for instance, which also works with the numeric (index-based) entries, e.g., $Matches.0 instead of $Matches[0]
Caveat: If an array (enumerable) is used as the LHS operand, -match' behavior changes:
$Matches is not populated.
filtering is performed; that is, instead of returning a Boolean indicating whether matching succeeded, the subarray of matching input strings is returned.
Note that the $Matches hashtable only provides the matched strings, not also metadata such as index and length, as found in [regex]::Match()'s return object, which is of type [System.Text.RegularExpressions.Match].
Select-String solution:
$packages |
Select-String '(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)' |
ForEach-Object {
"$($_.Matches[0].Groups['packageId'].Value) - $($_.Matches[0].Groups['version'].Value)"
}
Select-String outputs Microsoft.PowerShell.Commands.MatchInfo instances, whose .Matches collection contains one or more [System.Text.RegularExpressions.Match] instances, i.e. instances of the same type as returned by [regex]::Match()
Unless -AllMatches is also passed, .Matches only ever has one entry, hence the use of [0] to target that entry above.
As you can see, working with Select-Object's output objects requires you to ultimately work with the same .NET type as when you call [regex]::Match() directly.
However, no method calls are required, and discovering the properties of the output objects is made easy in PowerShell via the Get-Member cmdlet.
If you want to capture the matches in a jagged array:
$capturedStrings = #(
$packages |
Select-String '(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)' |
ForEach-Object {
# Output an array of all capture-group matches,
# *as a single object* (note the `, `)
, $_.Matches[0].Groups.Where({ $_.Name -ne '0' }).Value
}
)
This returns an array of arrays, each element of which is the array of capture-group matches for a given package, so that $capturedStrings[0][0] returns the packageId value for the first package, for instance.
Note:
$_.Matches[0].Groups.Where({ $_.Name -ne '0' }).Value programmatically enumerates all capture-group matches and returns an their .Value property values as an array, using member-access enumeration; note how name '0' must be excluded, as it represents the whole match.
With the capture groups in your specific regex, the above is equivalent to the following, as shown in a commented-out line in your question:
#($_.Matches[0].Groups['packageId'].Value, $_.Matches[0].Groups['version'].Value)
, ..., the unary form of the array-construction operator, is used as a shortcut for outputting the array (symbolized by ... here) as a whole, as a single object. By default, enumeration would occur and the elements would be emitted one by one. , ... is in effect a shortcut to the conceptually clearer Write-Output -NoEnumerate ... - see this answer for an explanation of the technique.
Additionally, #(...), the array subexpression operator is needed in order to ensure that a jagged array (nested array) is returned even in the event that only one array is returned across all $packages.

Select-String not working on piped object using Out-String

I am doing an API request which returns a bunch of data. In attempted to search through it with Select-String, it just spits out the entire value stored in the variable. This is an internet server which I am calling an api.
$return = Invoke-RestMethod -Method GET -Uri $uri -Headers #{"authorization" = $token} -ContentType "application/json"
$file = $return.data
$file | Out-String -Stream | Select-String -Pattern "word"
this returns the entire value of $file. printing $file looks like same as the pipe output. Why is this not working?
$file.Gettype says it is a system.object, another answer said to use Out-String, but something is not working.
$file.Gettype
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
To complement iRon7's helpful answer with the precise logic of Out-String's -Stream switch, as of PowerShell 7.1:
Out-String, like the other Out-* cmdlets such as Out-File, uses PowerShell's rich output-formatting system to generate human-friendly representations of its input objects.
Without -Stream, Out-String only ever produces a single, (typically) multiline string.
With -Stream, line-by-line output behavior typically occurs - except for input objects that happen to be multiline strings, which are output as-is.
Because this exception is both obscure and unhelpful, GitHub proposal #14638 suggests removing it.
For so-called in-band data types, -Stream works as follows, which truly results in line-by-line output:
Input objects are formatted by PowerShell's rich formatting system, and the lines that make up the resulting representation are then output one by one.
Out-of-band data types are individually formatted outside of the formatting system, by simply calling their .NET .ToString() method.
In short: data types that represent a single value are out-of-band, and in addition to [string] out-of-band data types also comprise [char] and the various (standard) numeric types, such as [int], [long], [double], ...
[string] is the only out-of-band type that itself can result in a multiline representation, because calling .ToString() on a string is effective no-op that returns the string itself - whether it is single- or multiline.
Therefore:
Any string - notably also a multiline string - is output as-is, as a whole, and splitting it into individual lines requires an explicit operation; e.g. (note that regex \r?\n matches both Windows-style CRLF and Unix-style LF-only newlines):
"line 1`nline 2`nline 3" -split '\r?\n' # -> 'line 1', 'line 2', 'line 3'
If your input objects are a mix of in-band objects and (invariably out-of-band) multiline strings, you can combine Out-String -Stream with -split; e.g.:
((Get-Date), "line 1`nline 2`nline 3" | Out-String -Stream) -split '\r?\n'
On closer inspection, I suspect that your issue comes from an ambiguity in the Out-String documentation:
-Stream
Indicates that the cmdlet sends a separate string for each line of an
input object. By default, the strings for each object are accumulated
and sent as a single string.
Where the word line should be read as item.
To split you raw string into separate lines, you will need to split your string using the following command:
$Lines = $return.data -split [Environment]::NewLine
Note that this assumes that your data uses the same characters for a new line as the system you working on. If this is not the case, you might want to split the lines using an regular expression, e.g.:
$Lines = $return.data -split "`r*`n"
So what does the-Stream parameter do?
It sends a separate string for each item of an input object.
Where in this definition, it is also a known common best PowerShell practice to use a singular name for possible plural input objectS.
Meaning if you use the above defined $Lines variable (or something like $Lines = Get-Content .\File.json), the input object "$Lines" is a collection of strings:
$Lines.GetType().Name
String[]
if you stream this to Out-String it will (by default) join all the items and return a single string:
($Lines | Out-String).GetType().Name
String
In comparison, if you use the -Stream parameter, it will pass each separated item from the $Lines collection directly to the next cmdlet:
($Lines | Out-String -Stream).GetType().Name
Object[]
I have created a document issue for this: #7133 "line" should be "item"
Note:
In general, it is a bad practice to peek and poke directly into a
serialized string
(including Json) using string
methods and/or cmdlets (like Select-String). Instead you should use
the related parser (e.g.
ConvertFrom-Json)
for searching and replacing which will result in an easier syntax
and usually takes care of known issues and pitfalls.
Select-String outputs Microsoft.PowerShell.Commands.MatchInfo objects. It seems to me that the output is somehow fancified via the PS engine or something to highlight your match, but ultimately it does print the entire matched string.
You should check out the members of the object Select-String provides, like this:
$file | Out-String -Stream | Select-String -Pattern "word" | Get-Member
TypeName: Microsoft.PowerShell.Commands.MatchInfo
Name MemberType Definition
---- ---------- ----------
...
Matches Property System.Text.RegularExpressions.Match[] Matches {get;set;}
...
What you're interested in is the Matches property. It contains a bunch of information about the match. To extract exactly what you want, look at the Value property of Matches:
($file | Out-String -Stream | Select-String -Pattern "word").Matches.Value
word
Another way:
$file | Out-String -Stream | Select-String -Pattern "word" | ForEach-Object {$_.Matches} | Select-Object -Property Value
Value
-----
word
Or
$file | Out-String -Stream | Select-String -Pattern "word" | ForEach-Object {$_.Matches} | Select-Object -ExpandProperty Value
word

Remove the at symbol ( # ) and curly bracket ( { ) from Select-Sring output in Powershell

I'm parsing filenames in Powershell, and when I use Get-ChildItem | select name, I get a clean output of the files:
file1.txt
file2.txt
file3.txt
But when I try to narrow down those files with Select-String, I'm getting a weird # and { in front of my output:
Get-ChildItem | select name | Select-String -Pattern "1"
#{file1.txt}
Is there a parameter I'm missing? If I pipe with findstr rather than Select-String it works like a charm:
Get-ChildItem | select name | Findstr "1"
file1.txt
You can simplify and speed up your command as follows:
#((Get-ChildItem).Name) -match '1'
Note: #(), the array-subexpression operator, is needed to ensure that -match operates on an array, even if only one file happens to exist in the current dir.
(...).Name uses member-access enumeration to extract all Name property values from the file-info objects returned by Get-ChildItem.
-match, the regular-expression matching operator, due to operating on an array of values, returns the sub-array of matching values.
To make your original command work:
Get-ChildItem | select -ExpandProperty Name |
Select-String -Pattern "1" | select -ExpandProperty Line
select -ExpandProperty Name makes select (Select-Object) return only the Name property values; by default (implied -Property parameter), a custom object that has a Name property is returned.
select -ExpandProperty line similarly extracts the Line property value from the Microsoft.PowerShell.Commands.MatchInfo instances that Select-String outputs.
Note that in PowerShell [Core] v7+ you could omit this step by instead using Select-String's (new) -Raw switch to request string-only output.
As for what you tried:
As stated, by not using -ExpandProperty, select name (implied -Property parameter) created a custom object ([pscustomobject] instance) with a Name property.
Select-String stringifies its input objects, if necessary, so it can perform a string search on them, which results in the representation you saw; here's a simulation:
# Stringify a custom object via an expandable string ("...")
PS> "$([pscustomobject] #{ Name = 'file1.txt' })"
#{Name=file1.txt}
As an aside:
The above stringification method is essentially like calling .ToString() on the input objects[1], which often results in useless string representations (by default, just the type name); a more useful and intuitive stringification would be to use PowerShell's rich output-formatting system, i.e. to use the string representation you would see in the console; changing Select-String's behavior to do that is the subject of this feature request on GitHub.
[1] Calling .ToString() directly on a [pscustomobject] instance is actually still broken as of PowerShell Core 7.0.0-rc.2, due to this bug; the workaround is to call .psobject.ToString() or to use an expandable string, as shown above.

Having problem with split method using powershell

I have an xml file where i have line some
<!--<__AMAZONSITE id="-123456780" instance ="CATZ00124"__/>-->
and i need the id and instance values from that particular line.
where i need have -123456780 as well as CATZ00124 in 2 different variables.
Below is the sample code which i have tried
$xmlfile = 'D:\Test\sample.xml'
$find_string = '__AMAZONSITE'
$array = #((Get-Content $xmlfile) | select-string $find_string)
Write-Host $array.Length
foreach ($commentedline in $array)
{
Write-Host $commentedline.Line.Split('id=')
}
I am getting below result:
<!--<__AMAZONSITE
"-123456780"
nstance
"CATZ00124"__/>
The preferred way still is to use XML tools for XML files.
As long a line with AMAZONSITE and instance is unique in the file this could do:
## Q:\Test\2019\09\13\SO_57923292.ps1
$xmlfile = 'D:\Test\sample.xml' # '.\sample.xml' #
## see following RegEx live and with explanation on https://regex101.com/r/w34ieh/1
$RE = '(?<=AMAZONSITE id=")(?<id>[\d-]+)" instance ="(?<instance>[^"]+)"'
if((Get-Content $xmlfile -raw) -match $RE){
$AmazonSiteID = $Matches.id
$Instance = $Matches.instance
}
LotPings' answer sensibly recommends using a regular expression with capture groups to extract the substrings of interest from each matching line.
You can incorporate that into your Select-String call for a single-pipeline solution (the assumption is that the XML comments of interest are all on a single line each):
# Define the regex to use with Select-String, which both
# matches the lines of interest and captures the substrings of interest
# ('id' an 'instance' attributes) via capture groups, (...)
$regex = '<!--<__AMAZONSITE id="(.+?)" instance ="(.+?)"__/>-->'
Select-String -LiteralPath $xmlfile -Pattern $regex | ForEach-Object {
# Output a custom object with properties reflecting
# the substrings of interest reported by the capture groups.
[pscustomobject] #{
id = $_.Matches.Groups[1].Value
instance = $_.Matches.Groups[2].Value
}
}
The result is an array of custom objects that each have an .id and .instance property with the values of interest (which is preferable to setting individual variables); in the console, the output would look something like this:
id instance
-- --------
-123456780 CATZ00124
-123456781 CATZ00125
-123456782 CATZ00126
As for what you tried:
Note: I'm discussing your use of .Split(), though for extracting a substring, as is your intent, .Split() is not the best tool, given that it is only the first step toward isolating the substring of interest.
As LotPings notes in a comment, in Windows PowerShell, $commentedline.Line.Split('id=') causes the String.Split() method to split the input string by any of the individual characters in split string 'id=', because the method overload that Windows PowerShell selects takes a char[] value, i.e. an array of characters, which is not your intent.
You could rectify this as follows, by forcing use of the overload that accepts string[] (even though you're only passing one string), which also requires passing an options argument:
$commentedline.Line.Split([string[] 'id=', 'None') # OK, splits by whole string
Note that in PowerShell Core the logic is reversed, because .NET Core introduced a new overload with just [string] (with an optional options argument), which PowerShell Core selects by default. Conversely, this means that if you do want by-any-character splitting in PowerShell Core, you must cast the split string to [char[]].
On a general note, PowerShell has the -split operator, which is regex-based and offers much more flexibility than String.Split() - see this answer.
Applied to your case:
$commentedline.Line -split 'id='
While id= is interpreted a regex by -split, that makes no difference here, given that the string contains no regex metacharacters (characters with special meaning); if you do want to safely split by a literal substring, use [regex]::Escape('...') as the RHS.
Note that -split is case-insensitive by default, as PowerShell generally is; however, you can use the -csplit variant for case-sensitive matching.