Count specific string in text file using PowerShell - powershell

Is it possible to count specific strings in a file and save the value in a variable?
For me it would be the String "/export" (without quotes).

Here's one method:
$FileContent = Get-Content "YourFile.txt"
$Matches = Select-String -InputObject $FileContent -Pattern "/export" -AllMatches
$Matches.Matches.Count

Here's a way to do it.
$count = (get-content file1.txt | select-string -pattern "/export").length
As mentioned in comments, this will return the count of lines containing the pattern, so if any line has more than one instance of the pattern, the count won't be correct.

If you're searching in a large file (several gigabytes) that could have have millions of matches, you might run into memory problems. You can do something like this (inspired by a suggestion from NealWalters):
Select-String -Path YourFile.txt -Pattern '/export' -SimpleMatch | Measure-Object -Line
This is not perfect because
it counts the number of lines that contain the match, not the total number of matches.
it prints some headings along with the count, rather than putting just the count into a variable.
You can probably solve these if you need to. But at least you won't run out of memory.

grep -co vs grep -c
Both are useful and thanks for the "o" version. New one to me.

Related

Get a line number on Powershell?

So I have been searching a lot and couldn't find anything that wouldn't return me nothing.
I have a code with a variable and I have a file with a lot of lines on it.
For example, I have the following file (things.txt):
Ketchup
Mustard
Pumpkin
Mustard
Ketchup
And what I want to take out is the line numbers of "Mustard". Here's the code I'm trying right now
$search="Mustard"
$linenumber=Get-Content things.txt | select-string $search -context 0,1
$linenumber.context
But it actually returns "". Everyone online was about using context but I only want to know the line number of every "Mustard" which are 2 and 4.
Thanks for your help!
Select-String returns the line number for you. You're just looking at the wrong property. Change your code to:
$search="Mustard"
$linenumber= Get-Content thing.txt | select-string $search
$linenumber.LineNumber
For people who are searching for one liner, code is self explanatory.
Get-Content .\myTextFile.txt | Select-String -Pattern "IamSearching" | Select-Object LineNumber
Edit(April 2022):
OR
(Select-String .\myTextFile.txt -Pattern "IamSearching").LineNumber
Try insert -Path after Get-Content

Select-String - find a string that spans multiple lines

I am trying to read lines in a file and search for a pattern that spans two lines. Looking at the file in notepad++ I see a LF char in the file.
Example log.txt:
I want to find this
value here: OK
My simple code does not work and returns nothing:
select-string -Path "log.txt" -Pattern "find this\n*value here: OK"
I have tried many combos of various things here including .+ and \r that I found posted on various threads. I can get the first line by using:
select-string -Path "log.txt" -Pattern "find this\n*"
Result of above is: I want to find this
Adding anything more to the line above results in nothing being returned. Any ideas how to do this using select-string? I was trying to avoid using get content due to the potential size of the files I am working with.
So I think I understand your question. If you have a file that has a line that you want to key off of then the next line is the line that you want to look at:
(Select-String -Path "Log.txt" -Pattern "find this" -Context 1).Context.PostContext
I wasn't sure if that carriage return was an artifact of your formatting or not. If it is not then this would work better:
(Select-String -Path "Log.txt" -Pattern "find this" -Context 2).Context.PostContext[1]
Here is a way to do it if you don't know how many lines will be between the two bits:
$file = Get-Content 'Log.txt' -Raw
$file -match '(?smi)I want to find this.*(value here: OK)'
$matches[1]
Since you might want a multi line regex solution you need to read in the text file as one string.
Using the test file:
Stfuf
Bagel
I want to find this
value here: OK
Things
I was able to get the result using a simple matching pattern that satisfies your example text.
(Get-Content -Raw c:\temp\test.txt | Select-String -Pattern "([\w ]+)\s*value here: OK").Matches.Groups[1].Value
regex101.com
Basically gets the text preceding invariant spaces, including newlines and the static text "value here: OK". Could be made better with positive look aheads but this seems to work fine.

search first "x" amount of characters in a line, and output entire line

I have a text file that could contain up to 1000 lines of data in the following format:
14410:3012669|EU14410|20/01/2017||||1|6|4|OUT FROM UNDER||22/02/2017 04:01:47|22/02/2017 21:19:52
14:3012670|EU016271751|20/01/2017||||2|6|4|BLOCK BET|\\acis-prod\Pictures\Entry\EU01627.jpg|22/02/2017 04:02:02|22/02/2017 21:19:52
301111:3012671|EU016275|20/01/2017||||2|6|4|VITAE MEDICAL CLINIC|\\tm-prod\Pictures\Entry\EU01.jpg|22/02/2017 04:02:11|22/02/2017 21:19:53
each line will start with the following format
"set of characters up to max of 8":"set of characters unlimited max"
I want to search the characters ONLY up until the first colon. Those characters could contain any amount up to a maximum of 8. (hopefully shown well in my examples above) I'm trying to search those first characters, up to the ":" of each line to see if it contains a string, and return the whole line. still new to powershell so I've only tried a simple select:
$path = "C:\Users\ME\Desktop\acsep22\acsnic-20170222_233324.done"
Get-ChildItem $path -recurse | Select-String -pattern ("14410","3011981","3011982",) | out-file $logfile |format-list
which works - but I didn't take into account that the string could also appear twice in the same line ( though unrelated to the first 7 characters)
for example:
14410:3012669|EU14410|
contains 14410 twice, they're unrelated in terms of their significance and I only want to search and return based on the first number
could somebody help me achieve this or could some one point me toward the cmdlet that would help?
I've tried various searches online (and via the Microsoft online resource) but a lot of results are more to do with "return the first X amount of characters" rather than "search using only the first X amount and return line"
Kind Regards
You could use a simple Where-Object filter to check whether the string before the : is one of the strings you expect:
$strings = '14410','3011981','3011982'
Get-Content $path |Where-Object {$strings -contains ($_ -split ':')[0]}
This is probably the most PowerShell-idiomatic approach.
If you want to use Select-String, you'll have to construct a regex pattern that will match on strings that start with one of the strings and then a colon:
$strings = '14410','3011981','3011982'
$pattern = '^(?:{0}):' -f ($strings -join '|') # pattern is now '^(?:14410|3011981|3011982):'
Select-String -Path $path -Pattern $pattern
If you just want the bare string itself from the output, grab the Line property from the objects returned by Select-String:
Select-String -Path $path -Pattern $pattern |Select-Object -Expand Line
or
Select-String -Path $path -Pattern $pattern |ForEach-Object Line
The pattern above uses a non-capturing group (?:pattern-goes-here) to match any one of the strings, at the start ^ of a string, followed by :.
Both solutions will work with an arbitrary number of strings

Extract specific data

Please help. I am trying to extract multiple filenames from the following .xml file. I then need to copy the list of files from one folder to another. A part of the XML I have posted below:
<component>
<altname>HP Broadcom Online Firmware Upgrade Utility for VMware 5.x</altname>
<filename>CP021404.scexe</filename>
<name>HP Broadcom Online Firmware Upgrade Utility for VMware 5.x</name>
<description>This package contains vSphere 5.1 and VMware </description>
<component>
<component>
<altname>Online ROM Flash - Power Management Controller </altname>
<filename>CP021615.scexe</filename>
I used Windows PowerShell as below and got the output, but the output contains filenames (CP021404.scexe, CP021614.scexe below), line# and symbol still in it. What am I doing wrong on my first PS attempt?
PowerShell
$input_path = ‘C:\PowerShell\hpsum_inventory.xml’
$output_file = ‘C:\powershell\hpsum_inventory-o.xml’
$regex = ".exe"
select-string -Path $input_path -Pattern $regex -AllMatches > $output_file
Output
PowerShell\hpsum_inventory.xml:8: <filename>CP021404.scexe</filename>
PowerShell\hpsum_inventory.xml:18: <filename>CP021614.scexe</filename>
The problem is that you're using a RegEx match and the period character in RegEx matches any character except Line Feed/New Line characters, so it's matching any character followed by 'exe'. Really what you want to do is read the file as XML, and just output the <filename> nodes.
$input_path = ‘C:\PowerShell\hpsum_inventory.xml’
$output_file = ‘C:\powershell\hpsum_inventory-o.xml’
$regex = "exe$"
(Select-Xml -Path $input_path -XPath //filename).node.InnerText | ?{$_ -match $regex} | out-file $output_file
Edit: Ok, you need to incorporate that into a string, that's easy enough. We'll add a ForEach loop (I use the alias % for that) to the last line to insert the file name into a string.
(Select-Xml -Path $input_path -XPath //filename).node.InnerText | ?{$_ -match $regex} | %{"copy c:\powershell\$_ x:\firmware\"} | out-file $output_file
Edit2: Ok, so you want the knowledge in general of how to match text in a file. Can do! Select string will do what you want actually, it just wasn't the best method in general for the example you gave earlier. This gets a bit more interesting, since you need to be familiar with RegEx matching patterns, but other than that it's fairly straight forward. You want to use the -Pattern match again, but let me suggest a better pattern:
"filename>(.*?)<"
That looks for the filename tag, including closing > on it, and grabs everything up to the next < character. The () denote a capturing group, so the rest is ignored as far as the capture goes. Then we pipe to a ForEach loop, and for each line that it finds that matches we select the Matches property, and the second Group property of that (the first contains the whole text, including the filename> and < bits). So it looks like this:
$input_path = 'C:\PowerShell\hpsum_inventory.xml'
$output_file = 'C:\powershell\hpsum_inventory-o.xml'
$regex = "filename>(.*?)<"
select-string -Path $input_path -Pattern "filename>(.*?)<"|%{$_.matches.groups[1].value}
Now that only gets the file names. If we want to incorporate the rest of your thing about inserting it into text you enclose the part in the ForEach loop inside a sub-expression $() and then put that into your double quoted string like such:
select-string -Path $input_path -Pattern "filename>(.*?)<"|%{"copy c:\powershell\$($_.matches.groups[1].value) x:\firmware"}|Out-File $output_file
Personally I would suggest not doing that directly as it limits you. I'd collect the data in an array, then pipe that array into a process that does what you want, but then at least you have the collection so you can do with it what you want.
$input_path = 'C:\PowerShell\hpsum_inventory.xml'
$output_file = 'C:\powershell\hpsum_inventory-o.xml'
$regex = "filename>(.*?)<"
$Filenames = select-string -Path $input_path -Pattern "filename>(.*?)<"|%{$_.matches.groups[1].value}
$Filenames|%{"copy c:\powershell\$_ x:\firmware"}|Out-File $output_file
Why do it that way? What if you don't want to over-write something? Then you can do something like:
$Filenames|?{$_ -notin (GCI X:\firmware -file|select -expand name)}|%{"copy c:\powershell\$_ x:\firmware"}|Out-File $output_file
For your collection of serial numbers, try the regex pattern of:
"Serial Number: (\S*)"
In RegEx there are a few escaped characters that have special meaning, and capitalizing them inverts that meaning. \s means whitespace, so spaces, tabs, what not. Doing it as a capital means something that is NOT whitespace. The asterisk means however many of the previous thing (not whitespace) it can find. So this looks for 'Serial Number: ' and then captures everything after that until it reaches the end of the line or encounters whitespace. Check out this link to see how it works.

How to get files which have multiple required strings

I am trying to find the files which have given strings. I am using the below line
Get-ChildItem -recurse | Select-String -pattern "Magnet","Stew" | group path | select name
But it is giving the files which are having any one of the words "Magnet","Stew". But I want the files which have both the words. In logically speaking the above command interprets it as "Or" condition. I want "And" condition. Can anybody guide me of how to do this?
Try this for your regex:
'.*(?:stew.*magnet)|(?:magnet.*stew).*'
Edit:
If you're looking for those matches anywhere in the file:
Get-ChildItem -Recurse |
where {(
($_ | Select-String -Pattern 'Stew' -SimpleMatch -Quiet) -and
($_ | Select-String -Pattern 'Magnet' -SimpleMatch -Quiet)
)}
The solution posted at the link CB provided will work, but doesn't seem very efficient for what you're needing to do. The -SimpleMatch will be faster than using a Regex pattern, and the -Quiet switch will make it just return True or False, depending on whether it found a match in the file. If one or the other of the terms is much less likely to appear in the file, change the order so that one appears first in the test stack so the test will fail sooner and it can move on to the next file.