Select-String pattern not matching - powershell

I have the text of a couple hundred Word documents saved into individual .txt files in a folder. I am having an issue where a MergeField in the Word document wasn't formatted correctly, and now I need to find all the instances in the folder where the incorrect formatting occurs. the incorrect formatting is the string \#,$##,##0.00\* So, I'm trying to use PowerShell as follows:
select-string -path MY_PATH\.*txt -pattern '\#,$##,##0.00\*'
select-string -path MY_PATH\.*txt -pattern "\#`,`$##`,##0.00\*"
But neither of those commands finds any results, even though I'm sure the string exists in at least one file. I feel like the error is occurring because there are special characters in the parameter (specifically $ and ,) that I'm not escaping correctly, but I'm not sure how else to format the pattern. Any suggestions?

If you are actually looking for \#,$##,##0.00\* then you need to be aware that Select-String uses regex and you have a lot of control characters in there. Your string should be
\\\#,\$\#\#,\#\#0\.00\\\*
Or you can use the static method Escape of regex to do the dirty work for you.
[regex]::Escape("\#,$##,##0.00\*")
To put this all together you would get the following:
select-string -path MY_PATH\.*txt -pattern ([regex]::Escape("\#,$##,##0.00\*"))
Or even simpler would be to use the parameter -SimpleMatch since it does not interpet the string .. just searches as is. More here
select-string -path MY_PATH\.*txt -SimpleMatch "\#,$##,##0.00\*"

My try, similar to Matts:
select-string -path .\*.txt -pattern '\\#,\$##,##0\.00\\\*'
result:
test.txt:1:\#,$##,##0.00\*

Related

Select-String - find a string that spans multiple lines

I am trying to read lines in a file and search for a pattern that spans two lines. Looking at the file in notepad++ I see a LF char in the file.
Example log.txt:
I want to find this
value here: OK
My simple code does not work and returns nothing:
select-string -Path "log.txt" -Pattern "find this\n*value here: OK"
I have tried many combos of various things here including .+ and \r that I found posted on various threads. I can get the first line by using:
select-string -Path "log.txt" -Pattern "find this\n*"
Result of above is: I want to find this
Adding anything more to the line above results in nothing being returned. Any ideas how to do this using select-string? I was trying to avoid using get content due to the potential size of the files I am working with.
So I think I understand your question. If you have a file that has a line that you want to key off of then the next line is the line that you want to look at:
(Select-String -Path "Log.txt" -Pattern "find this" -Context 1).Context.PostContext
I wasn't sure if that carriage return was an artifact of your formatting or not. If it is not then this would work better:
(Select-String -Path "Log.txt" -Pattern "find this" -Context 2).Context.PostContext[1]
Here is a way to do it if you don't know how many lines will be between the two bits:
$file = Get-Content 'Log.txt' -Raw
$file -match '(?smi)I want to find this.*(value here: OK)'
$matches[1]
Since you might want a multi line regex solution you need to read in the text file as one string.
Using the test file:
Stfuf
Bagel
I want to find this
value here: OK
Things
I was able to get the result using a simple matching pattern that satisfies your example text.
(Get-Content -Raw c:\temp\test.txt | Select-String -Pattern "([\w ]+)\s*value here: OK").Matches.Groups[1].Value
regex101.com
Basically gets the text preceding invariant spaces, including newlines and the static text "value here: OK". Could be made better with positive look aheads but this seems to work fine.

Powershell String search

I am trying to search a keyword/pattern match in a file, where the lines will be starting with date.
Line will be like below
11/02/15 02:28:49%%PROGRAM$$SUCCESS$$End.
So i tried with below command,
Select-String -Path C:\Path\To\File.txt -Pattern $(Get-Date -format d) | Select-String -Pattern SUCCESS
So that i can get lines which contain SUCCESS with a starting of current date.
Its working on my test box and when i tried the same on a big file (~200 MB), its not giving any results. Tried below too,
Get-Content -Path C:\Path\To\File.txt | Select-String -Pattern $(Get-Date -format d) | Select-String -Pattern SUCCESS
Any help any help would be greatly appreciated!
Some things to consider here. As PetSerAl brings to light, Get-Date -Format d depends on the culture, so you need to be careful about relying on the output of that.
If the files you're searching are generated using Get-Date -Format d then it makes sense to do the search that way as long as the files will always be searched on a machine with the same culture they were generated with.
By the way on my machine it's 11/2/2015 not 11/02/15 and I am in the US.
Also, when you use Select-String -Pattern it's a regular expression, so you need to make sure that there are no special characters in the string. In the case of PetSerAl's date, the dots . would be interpreted as special characters. To avoid that use [RegEx]::Escape().
Select-String returns a match object (or objects), so piping it directly into another Select-String may not work. Consider making a single pattern out of it.
Just a guess here, but it kind of seems like the pattern you want is to match the current date string at the beginning of the line and then find SUCCESS anywhere after that in the line.
I think for that you could use a pattern like this: 11/02/15.+?SUCCESS
So code like this:
Get-Content -Path C:\Path\To\File.txt | Select-String -Pattern "$([RegEx]::Escape((Get-Date -Format d))).+?SUCCESS"
Would do the trick I think, again assuming culture issues don't mess you up.

How can I filter filename patterns in PowerShell?

I need to do something similar to Unix's ls | grep 'my[rR]egexp?' in Powershell. The similar expression ls | Select-String -Pattern 'my[rR]egexp?' seems to go through contents of the listed files, rather than simply filtering the filenames themselves.
The Select-String documentation hasn't been of much help either.
Very simple:
ls | where Name -match 'myregex'
There are other options, though:
(ls) -match 'myregex'
Or, depending on how complex your regex is, you could maybe also solve it with a simple wildcard match:
ls wild[ck]ard*.txt
which is faster than above options. And if you can get it into a wildcard match without character classes you can also just use the -Filter parameter to Get-ChildItem (ls), which performs filtering on the file system level and thus is even faster. Note also that PowerShell is case-insensitive by default, so a character class like [rR] is unnecessary.
While researching based on #Joey's answer, I stumbled upon another way to achieve the same (based on Select-String itself):
ls -Name | Select-String -Pattern 'my[Rr]egexp?'
The -Name argument seems to make ls return the result as a plain string rather than FileInfo object, so Select-String treats it as the string to be searched in rather than a list of files to be searched.

Count specific string in text file using PowerShell

Is it possible to count specific strings in a file and save the value in a variable?
For me it would be the String "/export" (without quotes).
Here's one method:
$FileContent = Get-Content "YourFile.txt"
$Matches = Select-String -InputObject $FileContent -Pattern "/export" -AllMatches
$Matches.Matches.Count
Here's a way to do it.
$count = (get-content file1.txt | select-string -pattern "/export").length
As mentioned in comments, this will return the count of lines containing the pattern, so if any line has more than one instance of the pattern, the count won't be correct.
If you're searching in a large file (several gigabytes) that could have have millions of matches, you might run into memory problems. You can do something like this (inspired by a suggestion from NealWalters):
Select-String -Path YourFile.txt -Pattern '/export' -SimpleMatch | Measure-Object -Line
This is not perfect because
it counts the number of lines that contain the match, not the total number of matches.
it prints some headings along with the count, rather than putting just the count into a variable.
You can probably solve these if you need to. But at least you won't run out of memory.
grep -co vs grep -c
Both are useful and thanks for the "o" version. New one to me.

How can I remove CRLF if anywhere between double quotes, using PowerShell?

My text file looks like this.
"MikeCRLF","","","Dell","DevelCRLFCRLFoper"CRLF
"SuCRLFsan","","","Apple","ManagCRLFer"CRLF
Desired result:
"Mike","","","Dell","Developer"LF
"Susan","","","Apple","Manager"LF
I tried this on PowerShell:
"C:\Users\abc\Desktop\1.txt"
(Get-Content $path -Raw).Replace("`r`n","`n") | Set-Content $path -Force
When I do this, I don't get the desired result. Also, I am left with one CRLF at the end. I don't want that either.
Please tell me how to do this using PowerShell v3.
This method avoids checking to see if \r\n is in quotes. Instead, it tries to find the "real" end of line situations and converts those first. Then it just purges the rest.
(Get-Content test.txt -Raw) -replace '([^,]")(\s*\r\n\s*)+("[^,])',"`$1`n`$3" -replace '\r\n',''
I think this should handle most of the stuff you throw at it, but let me know if you find a special case.
edited to fix the replacement string
If you are using the PowerShell Community Extensions, you can use the ConvertTo-UnixLineEnding command e.g.:
ConvertTo-UnixLineEnding C:\users\abc\desktop1.txt -dest desktop1-converted.txt -Enc ascii