Powershell - Scan for multiple strings in multiple files - powershell

I am having issues resolving a revision to the script below.
This script will take in a number of key words, either manually added or read from a file.
It will output the data when it finds a match by listing the File Name, Line number and the search word. Unfortunately if I'm searching for multiple words it has to scan the files for each separate word. That means if I have 20 search words, it will open, scan, close each file 20 times. Once for each word.
Not good as it takes time and I still have to troll through the output to total how many matches per file.
Every change I make is disastrous as it prints every single search word without knowing what word was the match or worse it fails to run.
Would anyone be able to help me alter the script to scan the files once for ALL the search words and list out only the matches in a readable way like the output below?
$searchWords="Address", "City","State"
Foreach ($sw in $searchWords)
{
Get-Childitem -Path "C:\Workspace\src" -Recurse -include "*.cbl" |
Select-String -Pattern "$sw" |
Select Path,LineNumber,#{n='SearchWord';e={$sw}}
}

-Pattern accepts an array of patterns, and which of those pattern caused a given match can be accessed via the .Pattern property of Select-String's output objects:[1]
Get-Childitem -Path "C:\Workspace\src" -Recurse -include "*.cbl" |
Select-String -Pattern "Address", "City", "State" |
Select Path, LineNumber, #{n='SearchWord';e={$_.Pattern}}
Note: I'm passing the search words as a literal array here, for brevity; in your code, simply replace "Address", "City", "State" with $searchWords (without enclosure in "...").
As an aside: Using -Filter instead of -Include can speed up your command, and, given that your command arguments don't contains spaces or other metacharacters, quoting them is optional:
Get-Childitem -Path C:\Workspace\src -Recurse -Filter *.cbl |
Select-String -Pattern Address, City, State |
Select Path, LineNumber, #{n='SearchWord';e={$_.Pattern}}
[1] Note: Only the first among the specified pattern that matches, in the order specified, that matches on a given line is reported as the matching pattern for that line - even if others would match too. Even -AllMatches doesn't change that - it would only report multiple matches per line for that first pattern.

Related

Use Select-String to get the single word matching pattern from files

I am trying to get only the word when using Select-String, but instead it is returning the whole string
Select-String -Path .\*.ps1 -Pattern '-Az' -Exclude "Get-AzAccessToken","-Azure","Get-AzContext"
I want to get all words in all .ps1 files that contain '-Az', for example 'New-AzHierarchy'
Select-String outputs objects of type Microsoft.PowerShell.Commands.MatchInfo by default, which supplement the whole line (input object) on which a match was found (.Line property) with metadata about the match (in PowerShell (Core) 7+, you can use -Raw to directly output the matching lines (input objects) only).
Note that in the default display output, it appears that only the matching lines are printed, with PowerShell (Core) 7+ now highlighting the part that matched the pattern(s).
Select-String's -Include / -Exclude parameters do not modify what patterns are matched; instead, they modify the -Path argument to further narrow down the set of input files. Since a wildcard expression as part of the -Path argument is usually sufficient, these parameters are rarely used.
Therefore:
Use the objects in the .Matches collection property Select-String's output objects to access the part of the line that actually matched the given pattern(s).
Since you want to capture entire command names that contain substring -Az, such as New-AzHierarchy, you must use a regex pattern that also captures the relevant surrounding characters: \w+-Az\w+
The simplest way to exclude specific matches is to filter them out afterwards, using a Where-Object call.
# Note: -AllMatches ensures that if there are *multiple* matches
# on a single line, they are all reported.
Select-String -Path .\*.ps1 -Pattern '\w+-Az\w+' -AllMatches |
ForEach-Object { $_.Matches.Value } |
Where-Object { $_ -notin 'Get-AzAccessToken', '-Azure', 'Get-AzContext' }

Use PowerShell -Pattern to search for multiple values on multiple files

Team --
I have a snippet of code that works as designed. It will scan all the files within a folder hierarchy for a particular word and then return the de-duped file path of the files where all instance of the word were found.
$properties = #(
'Path'
)
Get-ChildItem -Path \\Server_Name\Folder_Name\* -recurse |
Select-String -Pattern ‘Hello’ |
Select-Object $properties -Unique |
Export-Csv \\Server_Name\Folder_Name\File_Name.csv -NoTypeInformation
I'd like to
Expand this code to be able to search for multiple words at once. So all cases where 'Hello' OR 'Hola' are found... and potentially an entire list of words if possible.
Have the code return not only the file path but the word that tripped it ... with multiple lines for the same path if both words tripped it
I've found some article talking about doing multiple word searches using methods like:
where { $_ | Select-String -Pattern 'Hello' } |
where { $_ | Select-String -Pattern 'Hola' } |
OR
Select-String -Pattern ‘(Hello.Hola)|(Hola.Hello)’
These codes will run ... but return no data is returned in the output file ... it just blank with the header 'Path'.
I'm missing something obvious ... anyone spare a fresh set of eyes?
MS
Select-String's -Pattern parameter accepts an array of patterns.
Each [Microsoft.PowerShell.Commands.MatchInfo] instance output by Select-String has a .Pattern property that indicates the specific pattern that matched.
Get-ChildItem -Path \\Server_Name\Folder_Name\* -recurse |
Select-String -Pattern 'Hello', 'Hola' |
Select-Object Path, Pattern |
Export-Csv \\Server_Name\Folder_Name\File_Name.csv -NoTypeInformation
Note:
If a given matching line matches multiple patterns, only the first matching pattern is reported for it, in input order.
While adding -AllMatches normally finds all matches on a line, as of PowerShell 7.2.x this doesn't work as expected with multiple patterns passed to -Pattern, with respect to the matches reported in the .Matches property - see GitHub issue #7765.
Similarly, the .Pattern property doesn't then reflect the (potentially) multiple patterns that matched; and, in fact, this isn't even possible at the moment, given that the .Pattern property is a [string]-typed member of [Microsoft.PowerShell.Commands.MatchInfo] and therefore cannot reflect multiple patterns.

Powershell, how to capture argument(s) of Select-String and include with matched output

Thanks to #mklement0 for the help with getting this far with answer given in Powershell search directory for code files with text matching input a txt file.
The below Powershell works well for finding the occurrences of a long list of database field names in a source code folder.
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile) |
Select-Object Path, LineNumber, line |
Export-csv $outputfile
However, many lines of source code have multiple matches, especially ADO.NET SQL statements with a lot of field names on one line. If the field name argument was included with the matching output the results will be more directly useful with less additional massaging such as lining up everything with the original field name list. For example if there is a source line "BatchId = NewId" it will match field name list item "BatchId". Is there an easy way to include in the output both "BatchId" and "BatchId = NewId"?
Played with the matches object but it doesn't seem to have the information. Also tried Pipeline variable like here but X is null.
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile -PipelineVariable x) |
Select-Object $x, Path, LineNumber, line |
Export-csv $outputile
Thanks.
The Microsoft.PowerShell.Commands.MatchInfo instances that Select-String outputs have a Pattern property that reflects the specific pattern among the (potential) array of patterns passed to -Pattern that matched on a given line.
The caveat is that if multiple patterns match, .Pattern only reports the pattern among those that matched that is listed first among them in the -Pattern argument.
Here's a simple example, using an array of strings to simulate lines from files as input:
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -Pattern ('bar', 'foo') |
Select-Object Line, LineNumber, Pattern
The above yields:
Line LineNumber Pattern
---- ---------- -------
A fool and 1 foo
his barn 2 bar
foo and bar on the same line 4 bar
Note how 'bar' is listed as the Pattern value for the last line, even though 'foo' appeared first in the input line, because 'bar' comes before 'foo' in the pattern array.
To reflect the actual pattern that appears first on the input line in a Pattern property, more work is needed:
Formulate your array of patterns as a single regex using alternation (|), wrapped as a whole in a capture group ((...)) - e.g., '(bar|foo)')
Note: The expression used below, '({0})' -f ('bar', 'foo' -join '|'), constructs this regex dynamically, from an array (the array literal 'bar', 'foo' here, but you can substitute any array variable or even (Get-Content $inputFile)); if you want to treat the input patterns as literals and they happen to contain regex metacharacters (such as .), you'll need to escape them with [regex]::Escape() first.
Use a calculated property to define a custom Pattern property that reports the capture group's value, which is the first among the values encountered on each input line:
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -AllMatches -Pattern ('({0})' -f ('bar', 'foo' -join '|')) |
Select-Object Line, LineNumber,
#{ n='Pattern'; e={ $_.Matches[0].Groups[1].Value } }
This yields (abbreviated to show only the last match):
Line LineNumber Pattern
---- ---------- -------
...
foo and bar on the same line 4 foo
Now, 'foo' is properly reported as the matching pattern.
To report all patterns found on each line:
Switch -AllMatches is required to tell Select-String to find all matches on each line, represented in the .Matches collection of the MatchInfo output objects.
The .Matches collection must then be enumerated (via the .ForEach() collection method) to extract the capture-group value from each match.
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -AllMatches -Pattern ('({0})' -f ('bar', 'foo' -join '|')) |
Select-Object Line, LineNumber,
#{ n='Pattern'; e={ $_.Matches.ForEach({ $_.Groups[1].Value }) } }
This yields (abbreviated to show only the last match):
Line LineNumber Pattern
---- ---------- -------
...
foo and bar on the same line 4 {foo, bar}
Note how both 'foo' and 'bar' are now reported in Pattern, in the order encountered on the line.
The solid information and examples from #mklement0 were enough to point me in the right direction for researching and understanding more about Powershell and the object pipeline and calculated properties.
I was able to finally achieve my goals of a cross referencing a list of table and field names to the C# code base.The input file is simply table and field names, pipe delimited. (one of the glitches I had was not using pipe in the split, it was a visual error that took awhile to finally see, so check for that). The output is the table name, field name, code file name, line number and actual line. It's not perfect but much better than manual effort for a few hundred fields! And now there are possibilities for further automation in the data mapping and conversion project. Thought about using C# utility programming but that might have taken just as long to figure out and implement and much more cumbersome that a working Powershell.
The key for me at this point is "working"! My first deeper dive into the abstruse world of Powershell. The key points of my solution are the use of the calculated property to get the table and field names in the output, realization that expressions can be used in certain places like to build a Pattern and that the pipeline is passing only certain specific objects after each command (maybe that is too restricted of a view but it's better than what I had before).
Hope this helps someone in future. I could not find any examples close enough to get over the hump and so asked my first ever stackoverflow questions.
$inputFile = "C:\input.txt"
$outputFile = "C:\output.csv"
$results = Get-Content $inputfile
foreach ($i in $results) {
Get-ChildItem -Path "C:\ProjectFolder" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force |
Select-String -Pattern $i.Split('|')[1] |
Select-Object #{ n='Pattern'; e={ $i.Split('|')[0], $i.Split('|')[1] -join '|'} }, Filename, LineNumber, line |
Export-Csv $outputFile -Append}

search first "x" amount of characters in a line, and output entire line

I have a text file that could contain up to 1000 lines of data in the following format:
14410:3012669|EU14410|20/01/2017||||1|6|4|OUT FROM UNDER||22/02/2017 04:01:47|22/02/2017 21:19:52
14:3012670|EU016271751|20/01/2017||||2|6|4|BLOCK BET|\\acis-prod\Pictures\Entry\EU01627.jpg|22/02/2017 04:02:02|22/02/2017 21:19:52
301111:3012671|EU016275|20/01/2017||||2|6|4|VITAE MEDICAL CLINIC|\\tm-prod\Pictures\Entry\EU01.jpg|22/02/2017 04:02:11|22/02/2017 21:19:53
each line will start with the following format
"set of characters up to max of 8":"set of characters unlimited max"
I want to search the characters ONLY up until the first colon. Those characters could contain any amount up to a maximum of 8. (hopefully shown well in my examples above) I'm trying to search those first characters, up to the ":" of each line to see if it contains a string, and return the whole line. still new to powershell so I've only tried a simple select:
$path = "C:\Users\ME\Desktop\acsep22\acsnic-20170222_233324.done"
Get-ChildItem $path -recurse | Select-String -pattern ("14410","3011981","3011982",) | out-file $logfile |format-list
which works - but I didn't take into account that the string could also appear twice in the same line ( though unrelated to the first 7 characters)
for example:
14410:3012669|EU14410|
contains 14410 twice, they're unrelated in terms of their significance and I only want to search and return based on the first number
could somebody help me achieve this or could some one point me toward the cmdlet that would help?
I've tried various searches online (and via the Microsoft online resource) but a lot of results are more to do with "return the first X amount of characters" rather than "search using only the first X amount and return line"
Kind Regards
You could use a simple Where-Object filter to check whether the string before the : is one of the strings you expect:
$strings = '14410','3011981','3011982'
Get-Content $path |Where-Object {$strings -contains ($_ -split ':')[0]}
This is probably the most PowerShell-idiomatic approach.
If you want to use Select-String, you'll have to construct a regex pattern that will match on strings that start with one of the strings and then a colon:
$strings = '14410','3011981','3011982'
$pattern = '^(?:{0}):' -f ($strings -join '|') # pattern is now '^(?:14410|3011981|3011982):'
Select-String -Path $path -Pattern $pattern
If you just want the bare string itself from the output, grab the Line property from the objects returned by Select-String:
Select-String -Path $path -Pattern $pattern |Select-Object -Expand Line
or
Select-String -Path $path -Pattern $pattern |ForEach-Object Line
The pattern above uses a non-capturing group (?:pattern-goes-here) to match any one of the strings, at the start ^ of a string, followed by :.
Both solutions will work with an arbitrary number of strings

PowerShell: return the number of instances find in a file for a search pattern

I have a text file with the following contents:
something
another something
stuff
more stuff
Using PowerShell, I have a script that searches for the pattern "something". This pattern will appear at most once per line on the file. I am trying to determine the number of times that this search pattern was found in the file (i.e., the number of lines that contain this pattern). I am using the following script:
$something_list = Select-String -Path $some_path -Pattern "something" | Select-Object Line
I then run the following command to get the number of elements in the Line property:
$n = $something_list.Length - 1
The problem I'm having is that this works if there are 2+ instances of "something" in the file. If there is only 1 instance of "something" in the file, $something_list.Length is meaningless, since Length can't be referenced for Line objects with only 1 element in them.
How can I resolve this?
you can use the Measure-Object cmdlet to select the count:
Select-String -Path $some_path -Pattern "something" | Measure-Object | select -expand count