Get count of exact word match within a Text file - powershell

The requirement is to get the exact match count for a word "test". So in the following example it should be 1:
testing 1 2 3 "test" testing
Tester testing 2345 tes testers testings testing
test
I tried the below code :
(Get-Content "C:\Users\abc\Desktop\POC\Findstring.txt" |
Select-String -Pattern "test" -AllMatches).matches.count
But it provides me the value as 9 since it provides a like functionality (it is also considering tester,testing etc in the count).
How should we ensure that we get the count for exact match and not for a LIKE operator scenario (similar to in SQL).

tl;dr
Use regex \btest\b as the -Pattern argument so as to match test as a whole word only.
Pass your input file path directly to Select-String's -LiteralPath parameter, which is much faster and more efficient than streaming the individual lines from the file via Get-Content.
(
Select-String -AllMatches `
-Pattern '\btest\b' `
-LiteralPath C:\Users\abc\Desktop\POC\Findstring.txt
).Matches.Count
Note: The command is spread across multiple lines for readability. To convert it to a single-line form, also remove the line-ending ` (backtick) characters, which act as line continuations.
Your intent is to limit matching test substrings to whole words.
Since Select-String uses regexes (regular expressions), you can do so by enclosing the substring in word-boundary assertions, \b, as Theo advises, i.e. '\btest\b'
For a detailed explanation of this regex and the ability to interact with it, see this regex101.com page
Also note that Select-String - like PowerShell in general - is case-insensitive by default; to match case-sensitively, add the -CaseSensitive switch.
Variation with also ignoring the word test when enclosed in "..."
If you additionally want to ignore "test" substrings (i.e. double-quoted instances of the word), you must amend your regex to also include a negative look-behind assertion, (?!...) in order to preclude a " preceding the word:
(
Select-String -AllMatches `
-Pattern '(?<!")\btest\b' `
-LiteralPath C:\Users\abc\Desktop\POC\Findstring.txt
).Matches.Count
See this regex101.com page.

Currently, you search for the pattern test which is also true in case of testing, testers, etc. The following should do the trick:
((Get-Content "C:\tmp\testdata.txt") -split " " | Select-String -Pattern '^(test)$' -AllMatches).count

Related

Use Select-String to get the single word matching pattern from files

I am trying to get only the word when using Select-String, but instead it is returning the whole string
Select-String -Path .\*.ps1 -Pattern '-Az' -Exclude "Get-AzAccessToken","-Azure","Get-AzContext"
I want to get all words in all .ps1 files that contain '-Az', for example 'New-AzHierarchy'
Select-String outputs objects of type Microsoft.PowerShell.Commands.MatchInfo by default, which supplement the whole line (input object) on which a match was found (.Line property) with metadata about the match (in PowerShell (Core) 7+, you can use -Raw to directly output the matching lines (input objects) only).
Note that in the default display output, it appears that only the matching lines are printed, with PowerShell (Core) 7+ now highlighting the part that matched the pattern(s).
Select-String's -Include / -Exclude parameters do not modify what patterns are matched; instead, they modify the -Path argument to further narrow down the set of input files. Since a wildcard expression as part of the -Path argument is usually sufficient, these parameters are rarely used.
Therefore:
Use the objects in the .Matches collection property Select-String's output objects to access the part of the line that actually matched the given pattern(s).
Since you want to capture entire command names that contain substring -Az, such as New-AzHierarchy, you must use a regex pattern that also captures the relevant surrounding characters: \w+-Az\w+
The simplest way to exclude specific matches is to filter them out afterwards, using a Where-Object call.
# Note: -AllMatches ensures that if there are *multiple* matches
# on a single line, they are all reported.
Select-String -Path .\*.ps1 -Pattern '\w+-Az\w+' -AllMatches |
ForEach-Object { $_.Matches.Value } |
Where-Object { $_ -notin 'Get-AzAccessToken', '-Azure', 'Get-AzContext' }

How to split a text file into two in PowerShell?

I have one text file with Script that I want to split into two
Below is the dummy script
--serverone
this is first part of my script
--servertwo
this is second part of my script
I want to create two text files that would look like
file1
--serverone
this is first part of my script
file2
--servertwo
this is second part of my script
So far, I have added a special character within the script that I know don't exist ("}")
$script = get-content -Path "C:\Users\shamvil\Desktop\test.txt"
$newscript = $script.Replace("--servertwo","}--servertwo")
$newscript.split("}")
but I don't know how to save the split into two separate places.
This might not be a best approach, so I am also open to different solution as well.
Please help, thanks!
Use a regex-based -split operation:
$i = 0
(Get-Content -Raw test.txt) -split '(?m)^(?=--)' -ne '' |
ForEach-Object { $fileName = 'file' + (++$i); Set-Content $fileName $_ }
This assumes that each block of lines that starts with a line that starts with -- is to be saved to a separate file.
Get-Content -Raw reads the entire file into a single, multi-line string.
As for the separator regex passed to -split:
The (?m) inline regex option makes anchors ^ and $ match on each line
^(?=--) therefore matches every line that starts with --, using a by definition non-capturing look-ahead assertion ((?=...)) to ensure that the -- isn't removed from the resulting blocks (by default, what matches the separator regex is not included).
-ne '' filters out the extra empty element that results from the separator expression matching at the very start of the string.
Note that Set-Content knows nothing about the character encoding of the input file and uses its default encoding; use -Encoding as needed.
zett42 points out that the file-writing part can be streamlined with the help of a delay-bind script-block parameter:
$i = 0
(Get-Content -Raw test.txt) -split '(?m)^(?=--)' -ne '' |
Set-Content -LiteralPath { (Get-Variable i -Scope 1).Value++; "file$i" }
The Get-Variable call to access and increment the $i variable in the parent scope is necessary, because delay-bind script blocks (as well as script blocks for calculated properties) run in a child scope - perhaps surprisingly, as discusssed in GitHub issue #7157
A shorter - but even more obscure - option is to use ([ref] $i).Value++ instead; see this answer for details.
zett42 also points to a proposed future enhancement that would obviate the need to maintain the sequence numbers manually, via the introduction of an automatic $PSIndex variable that reflects the sequence number of the current pipeline object: see GitHub issue #13772.

Select-String - find a string that spans multiple lines

I am trying to read lines in a file and search for a pattern that spans two lines. Looking at the file in notepad++ I see a LF char in the file.
Example log.txt:
I want to find this
value here: OK
My simple code does not work and returns nothing:
select-string -Path "log.txt" -Pattern "find this\n*value here: OK"
I have tried many combos of various things here including .+ and \r that I found posted on various threads. I can get the first line by using:
select-string -Path "log.txt" -Pattern "find this\n*"
Result of above is: I want to find this
Adding anything more to the line above results in nothing being returned. Any ideas how to do this using select-string? I was trying to avoid using get content due to the potential size of the files I am working with.
So I think I understand your question. If you have a file that has a line that you want to key off of then the next line is the line that you want to look at:
(Select-String -Path "Log.txt" -Pattern "find this" -Context 1).Context.PostContext
I wasn't sure if that carriage return was an artifact of your formatting or not. If it is not then this would work better:
(Select-String -Path "Log.txt" -Pattern "find this" -Context 2).Context.PostContext[1]
Here is a way to do it if you don't know how many lines will be between the two bits:
$file = Get-Content 'Log.txt' -Raw
$file -match '(?smi)I want to find this.*(value here: OK)'
$matches[1]
Since you might want a multi line regex solution you need to read in the text file as one string.
Using the test file:
Stfuf
Bagel
I want to find this
value here: OK
Things
I was able to get the result using a simple matching pattern that satisfies your example text.
(Get-Content -Raw c:\temp\test.txt | Select-String -Pattern "([\w ]+)\s*value here: OK").Matches.Groups[1].Value
regex101.com
Basically gets the text preceding invariant spaces, including newlines and the static text "value here: OK". Could be made better with positive look aheads but this seems to work fine.

search first "x" amount of characters in a line, and output entire line

I have a text file that could contain up to 1000 lines of data in the following format:
14410:3012669|EU14410|20/01/2017||||1|6|4|OUT FROM UNDER||22/02/2017 04:01:47|22/02/2017 21:19:52
14:3012670|EU016271751|20/01/2017||||2|6|4|BLOCK BET|\\acis-prod\Pictures\Entry\EU01627.jpg|22/02/2017 04:02:02|22/02/2017 21:19:52
301111:3012671|EU016275|20/01/2017||||2|6|4|VITAE MEDICAL CLINIC|\\tm-prod\Pictures\Entry\EU01.jpg|22/02/2017 04:02:11|22/02/2017 21:19:53
each line will start with the following format
"set of characters up to max of 8":"set of characters unlimited max"
I want to search the characters ONLY up until the first colon. Those characters could contain any amount up to a maximum of 8. (hopefully shown well in my examples above) I'm trying to search those first characters, up to the ":" of each line to see if it contains a string, and return the whole line. still new to powershell so I've only tried a simple select:
$path = "C:\Users\ME\Desktop\acsep22\acsnic-20170222_233324.done"
Get-ChildItem $path -recurse | Select-String -pattern ("14410","3011981","3011982",) | out-file $logfile |format-list
which works - but I didn't take into account that the string could also appear twice in the same line ( though unrelated to the first 7 characters)
for example:
14410:3012669|EU14410|
contains 14410 twice, they're unrelated in terms of their significance and I only want to search and return based on the first number
could somebody help me achieve this or could some one point me toward the cmdlet that would help?
I've tried various searches online (and via the Microsoft online resource) but a lot of results are more to do with "return the first X amount of characters" rather than "search using only the first X amount and return line"
Kind Regards
You could use a simple Where-Object filter to check whether the string before the : is one of the strings you expect:
$strings = '14410','3011981','3011982'
Get-Content $path |Where-Object {$strings -contains ($_ -split ':')[0]}
This is probably the most PowerShell-idiomatic approach.
If you want to use Select-String, you'll have to construct a regex pattern that will match on strings that start with one of the strings and then a colon:
$strings = '14410','3011981','3011982'
$pattern = '^(?:{0}):' -f ($strings -join '|') # pattern is now '^(?:14410|3011981|3011982):'
Select-String -Path $path -Pattern $pattern
If you just want the bare string itself from the output, grab the Line property from the objects returned by Select-String:
Select-String -Path $path -Pattern $pattern |Select-Object -Expand Line
or
Select-String -Path $path -Pattern $pattern |ForEach-Object Line
The pattern above uses a non-capturing group (?:pattern-goes-here) to match any one of the strings, at the start ^ of a string, followed by :.
Both solutions will work with an arbitrary number of strings

Select-String pattern not matching

I have the text of a couple hundred Word documents saved into individual .txt files in a folder. I am having an issue where a MergeField in the Word document wasn't formatted correctly, and now I need to find all the instances in the folder where the incorrect formatting occurs. the incorrect formatting is the string \#,$##,##0.00\* So, I'm trying to use PowerShell as follows:
select-string -path MY_PATH\.*txt -pattern '\#,$##,##0.00\*'
select-string -path MY_PATH\.*txt -pattern "\#`,`$##`,##0.00\*"
But neither of those commands finds any results, even though I'm sure the string exists in at least one file. I feel like the error is occurring because there are special characters in the parameter (specifically $ and ,) that I'm not escaping correctly, but I'm not sure how else to format the pattern. Any suggestions?
If you are actually looking for \#,$##,##0.00\* then you need to be aware that Select-String uses regex and you have a lot of control characters in there. Your string should be
\\\#,\$\#\#,\#\#0\.00\\\*
Or you can use the static method Escape of regex to do the dirty work for you.
[regex]::Escape("\#,$##,##0.00\*")
To put this all together you would get the following:
select-string -path MY_PATH\.*txt -pattern ([regex]::Escape("\#,$##,##0.00\*"))
Or even simpler would be to use the parameter -SimpleMatch since it does not interpet the string .. just searches as is. More here
select-string -path MY_PATH\.*txt -SimpleMatch "\#,$##,##0.00\*"
My try, similar to Matts:
select-string -path .\*.txt -pattern '\\#,\$##,##0\.00\\\*'
result:
test.txt:1:\#,$##,##0.00\*