Use Select-String to get the single word matching pattern from files - powershell

I am trying to get only the word when using Select-String, but instead it is returning the whole string
Select-String -Path .\*.ps1 -Pattern '-Az' -Exclude "Get-AzAccessToken","-Azure","Get-AzContext"
I want to get all words in all .ps1 files that contain '-Az', for example 'New-AzHierarchy'

Select-String outputs objects of type Microsoft.PowerShell.Commands.MatchInfo by default, which supplement the whole line (input object) on which a match was found (.Line property) with metadata about the match (in PowerShell (Core) 7+, you can use -Raw to directly output the matching lines (input objects) only).
Note that in the default display output, it appears that only the matching lines are printed, with PowerShell (Core) 7+ now highlighting the part that matched the pattern(s).
Select-String's -Include / -Exclude parameters do not modify what patterns are matched; instead, they modify the -Path argument to further narrow down the set of input files. Since a wildcard expression as part of the -Path argument is usually sufficient, these parameters are rarely used.
Therefore:
Use the objects in the .Matches collection property Select-String's output objects to access the part of the line that actually matched the given pattern(s).
Since you want to capture entire command names that contain substring -Az, such as New-AzHierarchy, you must use a regex pattern that also captures the relevant surrounding characters: \w+-Az\w+
The simplest way to exclude specific matches is to filter them out afterwards, using a Where-Object call.
# Note: -AllMatches ensures that if there are *multiple* matches
# on a single line, they are all reported.
Select-String -Path .\*.ps1 -Pattern '\w+-Az\w+' -AllMatches |
ForEach-Object { $_.Matches.Value } |
Where-Object { $_ -notin 'Get-AzAccessToken', '-Azure', 'Get-AzContext' }

Related

Powershell - Scan for multiple strings in multiple files

I am having issues resolving a revision to the script below.
This script will take in a number of key words, either manually added or read from a file.
It will output the data when it finds a match by listing the File Name, Line number and the search word. Unfortunately if I'm searching for multiple words it has to scan the files for each separate word. That means if I have 20 search words, it will open, scan, close each file 20 times. Once for each word.
Not good as it takes time and I still have to troll through the output to total how many matches per file.
Every change I make is disastrous as it prints every single search word without knowing what word was the match or worse it fails to run.
Would anyone be able to help me alter the script to scan the files once for ALL the search words and list out only the matches in a readable way like the output below?
$searchWords="Address", "City","State"
Foreach ($sw in $searchWords)
{
Get-Childitem -Path "C:\Workspace\src" -Recurse -include "*.cbl" |
Select-String -Pattern "$sw" |
Select Path,LineNumber,#{n='SearchWord';e={$sw}}
}
-Pattern accepts an array of patterns, and which of those pattern caused a given match can be accessed via the .Pattern property of Select-String's output objects:[1]
Get-Childitem -Path "C:\Workspace\src" -Recurse -include "*.cbl" |
Select-String -Pattern "Address", "City", "State" |
Select Path, LineNumber, #{n='SearchWord';e={$_.Pattern}}
Note: I'm passing the search words as a literal array here, for brevity; in your code, simply replace "Address", "City", "State" with $searchWords (without enclosure in "...").
As an aside: Using -Filter instead of -Include can speed up your command, and, given that your command arguments don't contains spaces or other metacharacters, quoting them is optional:
Get-Childitem -Path C:\Workspace\src -Recurse -Filter *.cbl |
Select-String -Pattern Address, City, State |
Select Path, LineNumber, #{n='SearchWord';e={$_.Pattern}}
[1] Note: Only the first among the specified pattern that matches, in the order specified, that matches on a given line is reported as the matching pattern for that line - even if others would match too. Even -AllMatches doesn't change that - it would only report multiple matches per line for that first pattern.

Use PowerShell -Pattern to search for multiple values on multiple files

Team --
I have a snippet of code that works as designed. It will scan all the files within a folder hierarchy for a particular word and then return the de-duped file path of the files where all instance of the word were found.
$properties = #(
'Path'
)
Get-ChildItem -Path \\Server_Name\Folder_Name\* -recurse |
Select-String -Pattern ‘Hello’ |
Select-Object $properties -Unique |
Export-Csv \\Server_Name\Folder_Name\File_Name.csv -NoTypeInformation
I'd like to
Expand this code to be able to search for multiple words at once. So all cases where 'Hello' OR 'Hola' are found... and potentially an entire list of words if possible.
Have the code return not only the file path but the word that tripped it ... with multiple lines for the same path if both words tripped it
I've found some article talking about doing multiple word searches using methods like:
where { $_ | Select-String -Pattern 'Hello' } |
where { $_ | Select-String -Pattern 'Hola' } |
OR
Select-String -Pattern ‘(Hello.Hola)|(Hola.Hello)’
These codes will run ... but return no data is returned in the output file ... it just blank with the header 'Path'.
I'm missing something obvious ... anyone spare a fresh set of eyes?
MS
Select-String's -Pattern parameter accepts an array of patterns.
Each [Microsoft.PowerShell.Commands.MatchInfo] instance output by Select-String has a .Pattern property that indicates the specific pattern that matched.
Get-ChildItem -Path \\Server_Name\Folder_Name\* -recurse |
Select-String -Pattern 'Hello', 'Hola' |
Select-Object Path, Pattern |
Export-Csv \\Server_Name\Folder_Name\File_Name.csv -NoTypeInformation
Note:
If a given matching line matches multiple patterns, only the first matching pattern is reported for it, in input order.
While adding -AllMatches normally finds all matches on a line, as of PowerShell 7.2.x this doesn't work as expected with multiple patterns passed to -Pattern, with respect to the matches reported in the .Matches property - see GitHub issue #7765.
Similarly, the .Pattern property doesn't then reflect the (potentially) multiple patterns that matched; and, in fact, this isn't even possible at the moment, given that the .Pattern property is a [string]-typed member of [Microsoft.PowerShell.Commands.MatchInfo] and therefore cannot reflect multiple patterns.

Script returning error: "Get-Content : An object at the specified path ... does not exist, or has been filtered by the -Include or -Exclude parameter

EDIT
I think I now know what the issue is - The copy numbers are not REALLY part of the filename. Therefore, when the array pulls it and then is used to get the match info, the file as it is in the array does not exist, only the file name with no copy number.
I tried writing a rename script but the same issue exists... only the few files I manually renamed (so they don't contain copy numbers) were renamed (successfully) by the script. All others are shown not to exist.
How can I get around this? I really do not want to manually work with 23000+ files. I am drawing a blank..
HELP PLEASE
I am trying to narrow down a folder full of emails (copies) with the same name "SCADA Alert.eml", "SCADA Alert[1].eml"...[23110], based on contents. And delete the emails from the folder that meet specific content criteria.
When I run it I keep getting the error in the subject line above. It only sees the first file and the rest it says do not exist...
The script reads through the folder, creates an array of names (does this correctly).
Then creates an variable, $email, and assigns the content of that file. for each $filename in the array.
(this is where is breaks)
Then is should match the specific string I am looking for to the content of the $email var and return true or false. If true I want it to remove the email, $filename, from the folder.
Thus narrowing down the email I have to review.
Any help here would be greatly appreciated.
This is what I have so far... (Folder is in the root of C:)
$array = Get-ChildItem -name -Path $FolderToRead #| Get-Content | Tee C:\Users\baudet\desktop\TargetFile.txt
Foreach ($FileName in $array){
$FileName # Check File
$email = Get-Content $FolderToRead\$FileName
$email # Check Content
$ContainsString = "False" # Set Var
$ContainsString # Verify Var
$ContainsString = %{$email -match "SYS$,ROC"} # Look for String
$ContainsString # Verify result of match
#if ($ContainsString -eq "True") {
#Remove-Item $FolderToRead\$element
#}
}
Here's a PowerShell-idiomatic solution that also resolves your original problems:
Get-ChildItem -File -LiteralPath $FolderToRead | Where-Object {
(Get-Content -Raw -LiteralPath $_.FullName) -match 'SYS\$,ROC'
} | Remove-Item -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
Note how the $ character in the RHS regex of the -match operator is \-escaped in order to use it verbatim (rather than as metacharacter $, the end-of-input anchor).
Also, given that $ is also used in PowerShell's string interpolation, it's better to use '...' strings (single-quoted, verbatim strings) to represent regexes, assuming no actual up-front string expansion is needed before the regex engine sees the resulting string - see this answer for more information.
As for what you tried:
The error message stemmed from the fact that Get-Content $FolderToRead\$FileName binds the file-name argument, $FolderToRead\$FileName, implicitly (positionally) to Get-Content's -Path parameter, which expects PowerShell wildcard patterns.
Since your file names literally contain [ and ] characters, they are misinterpreted by the (implied) -Path parameter, which can be avoided by using the -LiteralPath parameter instead (which must be specified explicitly, as a named argument).
%{$email -match "SYS$,ROC"} is unnecessarily wrapped in a ForEach-Object call (% is a built-in alias); while that doesn't do any harm in this case, it adds unnecessary overhead;
$email -match "SYS$,ROC" is enough, though it needs to be corrected to
$email -match 'SYS\$,ROC', as explained above.
[System.IO.Directory]::EnumerateFiles($Folder) |
Where-Object {$true -eq [System.IO.File]::ReadAllText($_, [System.Text.Encoding]::UTF8).Contains('SYS$,ROC') } |
ForEach-Object {
Write-Host "Removing $($_)"
#[System.IO.File]::Delete($_)
}
Your mistakes:
%{$email -match "SYS$,ROC"} - What % is intended to be? This is ForEach-Object alias.
%{$email -match "SYS$,ROC"} - Why use -match? This is much slower than -like or String.Contains()
%{$email -match "SYS$,ROC"} - When using $ inside double quotes, you should escape this using single backtick symbol (I have `$100). Otherwise, everything after $ is variable name: Hello, $username; I's $($weather.ToString()) today!
Write debug output in a right way: use Write-Debug, Write-Verbose, Write-Host, Write-Warning, Write-Error, Write-Information.
Can be better:
Avoid using Get-ChildItem, because Get-ChildItem returns files with attributes (like mtime, atime, ctime, etc). This additional info is additional request per file. When you need only list of files, use native .Net EnumerateFiles from System.IO.Directory. This is significant performace boost on huge amounts of files.
Use RealAllText or ReadAllLines or ReadAllBytes from System.IO.File static class to be more concrete instead of using universal Get-Content.
Use pipelines ;-)

search first "x" amount of characters in a line, and output entire line

I have a text file that could contain up to 1000 lines of data in the following format:
14410:3012669|EU14410|20/01/2017||||1|6|4|OUT FROM UNDER||22/02/2017 04:01:47|22/02/2017 21:19:52
14:3012670|EU016271751|20/01/2017||||2|6|4|BLOCK BET|\\acis-prod\Pictures\Entry\EU01627.jpg|22/02/2017 04:02:02|22/02/2017 21:19:52
301111:3012671|EU016275|20/01/2017||||2|6|4|VITAE MEDICAL CLINIC|\\tm-prod\Pictures\Entry\EU01.jpg|22/02/2017 04:02:11|22/02/2017 21:19:53
each line will start with the following format
"set of characters up to max of 8":"set of characters unlimited max"
I want to search the characters ONLY up until the first colon. Those characters could contain any amount up to a maximum of 8. (hopefully shown well in my examples above) I'm trying to search those first characters, up to the ":" of each line to see if it contains a string, and return the whole line. still new to powershell so I've only tried a simple select:
$path = "C:\Users\ME\Desktop\acsep22\acsnic-20170222_233324.done"
Get-ChildItem $path -recurse | Select-String -pattern ("14410","3011981","3011982",) | out-file $logfile |format-list
which works - but I didn't take into account that the string could also appear twice in the same line ( though unrelated to the first 7 characters)
for example:
14410:3012669|EU14410|
contains 14410 twice, they're unrelated in terms of their significance and I only want to search and return based on the first number
could somebody help me achieve this or could some one point me toward the cmdlet that would help?
I've tried various searches online (and via the Microsoft online resource) but a lot of results are more to do with "return the first X amount of characters" rather than "search using only the first X amount and return line"
Kind Regards
You could use a simple Where-Object filter to check whether the string before the : is one of the strings you expect:
$strings = '14410','3011981','3011982'
Get-Content $path |Where-Object {$strings -contains ($_ -split ':')[0]}
This is probably the most PowerShell-idiomatic approach.
If you want to use Select-String, you'll have to construct a regex pattern that will match on strings that start with one of the strings and then a colon:
$strings = '14410','3011981','3011982'
$pattern = '^(?:{0}):' -f ($strings -join '|') # pattern is now '^(?:14410|3011981|3011982):'
Select-String -Path $path -Pattern $pattern
If you just want the bare string itself from the output, grab the Line property from the objects returned by Select-String:
Select-String -Path $path -Pattern $pattern |Select-Object -Expand Line
or
Select-String -Path $path -Pattern $pattern |ForEach-Object Line
The pattern above uses a non-capturing group (?:pattern-goes-here) to match any one of the strings, at the start ^ of a string, followed by :.
Both solutions will work with an arbitrary number of strings

Extract lines matching a pattern from all text files in a folder to a single output file

I am trying to extract each line starting with "%%" in all files in a folder and then copy those lines to a separate text file. Currently using this code in PowerShell code, but I am not getting any results.
$files = Get-ChildItem "folder" -Filter *.txt
foreach ($file in $files)
{
if ($_ -like "*%%*")
{
Set-Content "Output.txt"
}
}
I think that mklement0's suggestion to use Select-String is the way to go. Adding to his answer, you can pipe the output of Get-ChildItem into the Select-String so that the entire process becomes a Powershell one liner.
Something like this:
Get-ChildItem "folder" -Filter *.txt | Select-String -Pattern '^%%' | Select -ExpandProperty line | Set-Content "Output.txt"
The Select-String cmdlet offers a much simpler solution (PSv3+ syntax):
(Select-String -Path folder\*.txt -Pattern '^%%').Line | Set-Content Output.txt
Select-String accepts a filename/path pattern via its -Path parameter, so, in this simple case, there is no need for Get-ChildItem.
If, by contrast, you input file selection is recursive or uses more complex criteria, you can pipe Get-ChildItem's output to Select-String, as demonstrated in Dave Sexton's helpful answer.
Note that, according to the docs, Select-String by default assumes that the input files are UTF-8-encoded, but you can change that with the -Encoding parameter; also consider the output encoding discussed below.
Select-String's -Pattern parameter expects a regular expression rather than a wildcard expression.
^%% only matches literal %% at the start (^) of a line.
Select-String outputs [Microsoft.PowerShell.Commands.MatchInfo] objects that contain information about each match; each object's .Line property contains the full text of an input line that matched.
Set-Content Output.txt sends all matching lines to single output file Output.txt
Set-Content uses the system's legacy Windows codepage (an 8-bit single-byte encoding - even though the documentation mistakenly claims that ASCII files are produced).
If you want to control the output encoding explicitly, use the -Encoding parameter; e.g., ... | Set-Content Output.txt -Encoding Utf8.
By contrast, >, the output redirection operator always creates UTF-16LE files (an encoding PowerShell calls Unicode), as does Out-File by default (which can be changed with -Encoding).
Also note that > / Out-File apply PowerShell's default formatting to the input objects to obtain the string representation to write to the output file, whereas Set-Content treats the input as strings (calls .ToString() on input objects, if necessary). In the case at hand, since all input objects are already strings, there is no difference (except for the character encoding, potentially).
As for what you've tried:
$_ inside your foreach ($file in $files) refers to a file (a [System.IO.FileInfo] object), so you're effectively evaluating your wildcard expression *%%* against the input file's name rather than its contents.
Aside from that, wildcard pattern *%%* will match %% anywhere in the input string, not just at its start (you'd have to use %%* instead).
The Set-Content "Output.txt" call is missing input, because it is not part of a pipeline and, in the absence of pipeline input, no -Value argument was passed.
Even if you did provide input, however, output file Output.txt would get rewritten as a whole in each iteration of your foreach loop.
First you have to use
Get-Content
in order to get the content of the file. Then you do the string match and based on that you again set the content back to the file. Use get-content and put another loop inside the foreach to iterate all the lines in the file.
I hope this logic helps you
ls *.txt | %{
$f = $_
gc $f.fullname | {
if($_.StartWith("%%") -eq 1){
$_ >> Output.txt
}#end if
}#end gc
}#end ls
Alias
ls - Get-ChildItem
gc - Get-Content
% - ForEach
$_ - Iterator variable for loop
>> - Redirection construct
# - Comment
http://ss64.com/ps/