Use PowerShell -Pattern to search for multiple values on multiple files - powershell

Team --
I have a snippet of code that works as designed. It will scan all the files within a folder hierarchy for a particular word and then return the de-duped file path of the files where all instance of the word were found.
$properties = #(
'Path'
)
Get-ChildItem -Path \\Server_Name\Folder_Name\* -recurse |
Select-String -Pattern ‘Hello’ |
Select-Object $properties -Unique |
Export-Csv \\Server_Name\Folder_Name\File_Name.csv -NoTypeInformation
I'd like to
Expand this code to be able to search for multiple words at once. So all cases where 'Hello' OR 'Hola' are found... and potentially an entire list of words if possible.
Have the code return not only the file path but the word that tripped it ... with multiple lines for the same path if both words tripped it
I've found some article talking about doing multiple word searches using methods like:
where { $_ | Select-String -Pattern 'Hello' } |
where { $_ | Select-String -Pattern 'Hola' } |
OR
Select-String -Pattern ‘(Hello.Hola)|(Hola.Hello)’
These codes will run ... but return no data is returned in the output file ... it just blank with the header 'Path'.
I'm missing something obvious ... anyone spare a fresh set of eyes?
MS

Select-String's -Pattern parameter accepts an array of patterns.
Each [Microsoft.PowerShell.Commands.MatchInfo] instance output by Select-String has a .Pattern property that indicates the specific pattern that matched.
Get-ChildItem -Path \\Server_Name\Folder_Name\* -recurse |
Select-String -Pattern 'Hello', 'Hola' |
Select-Object Path, Pattern |
Export-Csv \\Server_Name\Folder_Name\File_Name.csv -NoTypeInformation
Note:
If a given matching line matches multiple patterns, only the first matching pattern is reported for it, in input order.
While adding -AllMatches normally finds all matches on a line, as of PowerShell 7.2.x this doesn't work as expected with multiple patterns passed to -Pattern, with respect to the matches reported in the .Matches property - see GitHub issue #7765.
Similarly, the .Pattern property doesn't then reflect the (potentially) multiple patterns that matched; and, in fact, this isn't even possible at the moment, given that the .Pattern property is a [string]-typed member of [Microsoft.PowerShell.Commands.MatchInfo] and therefore cannot reflect multiple patterns.

Related

Use Select-String to get the single word matching pattern from files

I am trying to get only the word when using Select-String, but instead it is returning the whole string
Select-String -Path .\*.ps1 -Pattern '-Az' -Exclude "Get-AzAccessToken","-Azure","Get-AzContext"
I want to get all words in all .ps1 files that contain '-Az', for example 'New-AzHierarchy'
Select-String outputs objects of type Microsoft.PowerShell.Commands.MatchInfo by default, which supplement the whole line (input object) on which a match was found (.Line property) with metadata about the match (in PowerShell (Core) 7+, you can use -Raw to directly output the matching lines (input objects) only).
Note that in the default display output, it appears that only the matching lines are printed, with PowerShell (Core) 7+ now highlighting the part that matched the pattern(s).
Select-String's -Include / -Exclude parameters do not modify what patterns are matched; instead, they modify the -Path argument to further narrow down the set of input files. Since a wildcard expression as part of the -Path argument is usually sufficient, these parameters are rarely used.
Therefore:
Use the objects in the .Matches collection property Select-String's output objects to access the part of the line that actually matched the given pattern(s).
Since you want to capture entire command names that contain substring -Az, such as New-AzHierarchy, you must use a regex pattern that also captures the relevant surrounding characters: \w+-Az\w+
The simplest way to exclude specific matches is to filter them out afterwards, using a Where-Object call.
# Note: -AllMatches ensures that if there are *multiple* matches
# on a single line, they are all reported.
Select-String -Path .\*.ps1 -Pattern '\w+-Az\w+' -AllMatches |
ForEach-Object { $_.Matches.Value } |
Where-Object { $_ -notin 'Get-AzAccessToken', '-Azure', 'Get-AzContext' }

Powershell - Scan for multiple strings in multiple files

I am having issues resolving a revision to the script below.
This script will take in a number of key words, either manually added or read from a file.
It will output the data when it finds a match by listing the File Name, Line number and the search word. Unfortunately if I'm searching for multiple words it has to scan the files for each separate word. That means if I have 20 search words, it will open, scan, close each file 20 times. Once for each word.
Not good as it takes time and I still have to troll through the output to total how many matches per file.
Every change I make is disastrous as it prints every single search word without knowing what word was the match or worse it fails to run.
Would anyone be able to help me alter the script to scan the files once for ALL the search words and list out only the matches in a readable way like the output below?
$searchWords="Address", "City","State"
Foreach ($sw in $searchWords)
{
Get-Childitem -Path "C:\Workspace\src" -Recurse -include "*.cbl" |
Select-String -Pattern "$sw" |
Select Path,LineNumber,#{n='SearchWord';e={$sw}}
}
-Pattern accepts an array of patterns, and which of those pattern caused a given match can be accessed via the .Pattern property of Select-String's output objects:[1]
Get-Childitem -Path "C:\Workspace\src" -Recurse -include "*.cbl" |
Select-String -Pattern "Address", "City", "State" |
Select Path, LineNumber, #{n='SearchWord';e={$_.Pattern}}
Note: I'm passing the search words as a literal array here, for brevity; in your code, simply replace "Address", "City", "State" with $searchWords (without enclosure in "...").
As an aside: Using -Filter instead of -Include can speed up your command, and, given that your command arguments don't contains spaces or other metacharacters, quoting them is optional:
Get-Childitem -Path C:\Workspace\src -Recurse -Filter *.cbl |
Select-String -Pattern Address, City, State |
Select Path, LineNumber, #{n='SearchWord';e={$_.Pattern}}
[1] Note: Only the first among the specified pattern that matches, in the order specified, that matches on a given line is reported as the matching pattern for that line - even if others would match too. Even -AllMatches doesn't change that - it would only report multiple matches per line for that first pattern.

Powershell, how to capture argument(s) of Select-String and include with matched output

Thanks to #mklement0 for the help with getting this far with answer given in Powershell search directory for code files with text matching input a txt file.
The below Powershell works well for finding the occurrences of a long list of database field names in a source code folder.
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile) |
Select-Object Path, LineNumber, line |
Export-csv $outputfile
However, many lines of source code have multiple matches, especially ADO.NET SQL statements with a lot of field names on one line. If the field name argument was included with the matching output the results will be more directly useful with less additional massaging such as lining up everything with the original field name list. For example if there is a source line "BatchId = NewId" it will match field name list item "BatchId". Is there an easy way to include in the output both "BatchId" and "BatchId = NewId"?
Played with the matches object but it doesn't seem to have the information. Also tried Pipeline variable like here but X is null.
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile -PipelineVariable x) |
Select-Object $x, Path, LineNumber, line |
Export-csv $outputile
Thanks.
The Microsoft.PowerShell.Commands.MatchInfo instances that Select-String outputs have a Pattern property that reflects the specific pattern among the (potential) array of patterns passed to -Pattern that matched on a given line.
The caveat is that if multiple patterns match, .Pattern only reports the pattern among those that matched that is listed first among them in the -Pattern argument.
Here's a simple example, using an array of strings to simulate lines from files as input:
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -Pattern ('bar', 'foo') |
Select-Object Line, LineNumber, Pattern
The above yields:
Line LineNumber Pattern
---- ---------- -------
A fool and 1 foo
his barn 2 bar
foo and bar on the same line 4 bar
Note how 'bar' is listed as the Pattern value for the last line, even though 'foo' appeared first in the input line, because 'bar' comes before 'foo' in the pattern array.
To reflect the actual pattern that appears first on the input line in a Pattern property, more work is needed:
Formulate your array of patterns as a single regex using alternation (|), wrapped as a whole in a capture group ((...)) - e.g., '(bar|foo)')
Note: The expression used below, '({0})' -f ('bar', 'foo' -join '|'), constructs this regex dynamically, from an array (the array literal 'bar', 'foo' here, but you can substitute any array variable or even (Get-Content $inputFile)); if you want to treat the input patterns as literals and they happen to contain regex metacharacters (such as .), you'll need to escape them with [regex]::Escape() first.
Use a calculated property to define a custom Pattern property that reports the capture group's value, which is the first among the values encountered on each input line:
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -AllMatches -Pattern ('({0})' -f ('bar', 'foo' -join '|')) |
Select-Object Line, LineNumber,
#{ n='Pattern'; e={ $_.Matches[0].Groups[1].Value } }
This yields (abbreviated to show only the last match):
Line LineNumber Pattern
---- ---------- -------
...
foo and bar on the same line 4 foo
Now, 'foo' is properly reported as the matching pattern.
To report all patterns found on each line:
Switch -AllMatches is required to tell Select-String to find all matches on each line, represented in the .Matches collection of the MatchInfo output objects.
The .Matches collection must then be enumerated (via the .ForEach() collection method) to extract the capture-group value from each match.
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -AllMatches -Pattern ('({0})' -f ('bar', 'foo' -join '|')) |
Select-Object Line, LineNumber,
#{ n='Pattern'; e={ $_.Matches.ForEach({ $_.Groups[1].Value }) } }
This yields (abbreviated to show only the last match):
Line LineNumber Pattern
---- ---------- -------
...
foo and bar on the same line 4 {foo, bar}
Note how both 'foo' and 'bar' are now reported in Pattern, in the order encountered on the line.
The solid information and examples from #mklement0 were enough to point me in the right direction for researching and understanding more about Powershell and the object pipeline and calculated properties.
I was able to finally achieve my goals of a cross referencing a list of table and field names to the C# code base.The input file is simply table and field names, pipe delimited. (one of the glitches I had was not using pipe in the split, it was a visual error that took awhile to finally see, so check for that). The output is the table name, field name, code file name, line number and actual line. It's not perfect but much better than manual effort for a few hundred fields! And now there are possibilities for further automation in the data mapping and conversion project. Thought about using C# utility programming but that might have taken just as long to figure out and implement and much more cumbersome that a working Powershell.
The key for me at this point is "working"! My first deeper dive into the abstruse world of Powershell. The key points of my solution are the use of the calculated property to get the table and field names in the output, realization that expressions can be used in certain places like to build a Pattern and that the pipeline is passing only certain specific objects after each command (maybe that is too restricted of a view but it's better than what I had before).
Hope this helps someone in future. I could not find any examples close enough to get over the hump and so asked my first ever stackoverflow questions.
$inputFile = "C:\input.txt"
$outputFile = "C:\output.csv"
$results = Get-Content $inputfile
foreach ($i in $results) {
Get-ChildItem -Path "C:\ProjectFolder" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force |
Select-String -Pattern $i.Split('|')[1] |
Select-Object #{ n='Pattern'; e={ $i.Split('|')[0], $i.Split('|')[1] -join '|'} }, Filename, LineNumber, line |
Export-Csv $outputFile -Append}

Powershell - Find files that match a pattern for specific number of times

To find a simple pattern in a set of files in powershell I go
$pattern= 'mypattern'
$r= Get-ChildItem -Path "C:\.." -recurse |
Select-String -pattern $pattern | group path | select name
$r | Out-GridView
In my scenario, I have files that contain the pattern for more than one time and others that have the pattern for one time only. So I am interested in those files that contain the pattern for more than one time and not interested in the rest. Thanks
One approach for the start of what you are looking for is Select-String and Group-Object like you already have.
Select-String -Path (Get-ChildItem C:\temp\ -Filter *.txt -Recurse) -Pattern "140" -AllMatches |
Group-Object Path |
Where-Object{$_.Count -gt 1} |
Select Name, Count |
Out-GridView
This will take all the txt files in the temp directory and group them by the number of matches. -AllMatches is important as by default Select-String will only return the first match it finds on a line.
Of those groups we take the ones where the count is higher than one using Where-Object. Then we just output the file names and there counts with a Select Name,Count. Where name is the full file path where the matched text is located.
About Out-GridView
I see that you are assinging the output from Out-GridView to $r. If you want to do that you need to be sure you add the -PassThru parameter.

How to get files which have multiple required strings

I am trying to find the files which have given strings. I am using the below line
Get-ChildItem -recurse | Select-String -pattern "Magnet","Stew" | group path | select name
But it is giving the files which are having any one of the words "Magnet","Stew". But I want the files which have both the words. In logically speaking the above command interprets it as "Or" condition. I want "And" condition. Can anybody guide me of how to do this?
Try this for your regex:
'.*(?:stew.*magnet)|(?:magnet.*stew).*'
Edit:
If you're looking for those matches anywhere in the file:
Get-ChildItem -Recurse |
where {(
($_ | Select-String -Pattern 'Stew' -SimpleMatch -Quiet) -and
($_ | Select-String -Pattern 'Magnet' -SimpleMatch -Quiet)
)}
The solution posted at the link CB provided will work, but doesn't seem very efficient for what you're needing to do. The -SimpleMatch will be faster than using a Regex pattern, and the -Quiet switch will make it just return True or False, depending on whether it found a match in the file. If one or the other of the terms is much less likely to appear in the file, change the order so that one appears first in the test stack so the test will fail sooner and it can move on to the next file.