Is there a way to use Select-String to print the line number and the matched value (not the line)? - powershell

I am using powershell to search several files for a specific match to a regular expression. Because it is a regular expression, I wan't to only see what I have programmed my regex to accept, and the line number at which it is matched.
I then want to take the matched value and the line number and create an object to output to an excel file.
I can get each item in individual select string statements, but then they won't be matched up with each other
Select-String -Path $pathToFile -Pattern '(?<={\n\s*Box\s=\s")14\d{3}(?=",)' |
Select LineNumber, Matches.Value
#Will only print out the lineNumber
Select-String -Path $pathToFile -Pattern '(?<={\n\s*Box\s=\s")14\d{3}(?=",)' |
Foreach {$_.matches} | Select value
#Will only print matched value and can't print linenumber
Can anyone help me get both the line number and the matched value?
Edit: Just to clarify what I am doing
$files = Get-ChildItem $directory -Include *.vb,*.cs -Recurse
$colMatchedFiles = #()
foreach ($file in $files) {
$fileContent = Select-String -Path $file -Pattern '(?<={\n\s*Box\s=\s")14\d{3}(?=",)' |
Select-Object LineNumber, #{Name="Match"; Expression={$_.Matches[0].Groups[1].Value}}
write-host $fileContent #just for checking if there is anything
}
This still does not get anything, it just outputs a bunch of blank lines
Edit: What I am expecting to happen is for this script to search the content of all the files in the directory and find the lines that match the regular expression. Below is what I would expect for output for each file in the loop
LineNumber Match
---------- -----
324 15
582 118
603 139
... ...
File match sample:
{
Box = "5015",
Description = "test box 1"
}....
{
Box = "5118",
Description = "test box 2"
}...
{
Box = "5139",
Description = "test box 3"
}...

Example 1
Select the LineNumber and group value for each match. Example:
$sampleData = #'
prefix 1 A B suffix 1
prefix 2 A B suffix 2
'# -split "`n"
$sampleData | Select-String '(A B)' |
Select-Object LineNumber,
#{Name="Match"; Expression={$_.Matches[0].Groups[1].Value}}
Example 2
Search *.vb and *.cs for files containing the string Box = "<n>", where <n> is some number, and output the filename, line number of the file, and the number on the box = lines. Sample code:
Get-ChildItem $pathToFiles -Include *.cs,*.vb -Recurse |
Select-String 'box = "(\d+)"' |
Select-Object Path,
LineNumber,
#{Name="Match"; Expression={$_.Matches[0].Groups[1].Value -as [Int]}}
This returns output like the following:
Path LineNumber Match
---- ---------- -----
C:\Temp\test1.cs 2 5715
C:\Temp\test1.cs 6 5718
C:\Temp\test1.cs 10 5739
C:\Temp\test1.vb 2 5015
C:\Temp\test1.vb 6 5118
C:\Temp\test1.vb 10 5139
Example 3
Now that we know that we want the line before the match to contain {, we can use the -Context parameter with Select-String. Example:
Get-ChildItem $pathToFiles -Include *.cs,*.vb -Recurse |
Select-String 'box = "(\d+)"' -Context 1 | ForEach-Object {
# Line prior to match must contain '{' character
if ( $_.Context.DisplayPreContext[0] -like "*{*" ) {
[PSCustomObject] #{
Path = $_.Path
LineNumber = $_.LineNumber
Match = $_.Matches[0].Groups[1].Value
}
}
}

Related

Accessing matches variables in Powershell

This question is more about my understanding Powershell's objects rather than solving this practical example. I know there are other ways of separating out a page number from a string.
In my example I want to do this by accessing the object-match-value of the piped pattern match.
# data
$headerString = 'BARTLETT-BEDGGOOD__PAGE_5 BEECH-BEST__PAGE_6'
# require the number of page only
$regexPageNum = '([0-9]$)'
# split the header string into two separate strings to access page numbers
[string[]]$pages = $null
$pages = $headerString -split ' '
# access page numbers using regex pattern
$pages[0] | Select-String -AllMatches -Pattern $regexPageNum | Select-Object {$_.Matches.Value}
The output is:
$_.Matches.Value
----------------
5
Okay. So far so good. I see the page number of array member pages[0] But how do I take this value from the object? The following does not work.
$x = $pages[0] | Select-String -AllMatches -Pattern $regexPageNum | Select-Object {$_.Matches.Value}
Write-Host "Here it is:"$x
Output:
Here it is: #{$_.Matches.Value=5}
Instead of assigning the value 5 to the variable $x Powershell assigns, what looks to me: a hash table with an object description as its only member?
But if I try to access my variable using "Brackets for Access" Reference: hashtables Powershell indicates that variable $x is in fact an array.
x = $pages[0] | Select-String -AllMatches -Pattern $regexPageNum | Select-Object {$_.Matches.Value}
Write-Host "Here it is:"$x
$y = $x[$_.Matches.Value]
Write-Host "What about now:"$y
Output:
Here it is: #{$_.Matches.Value=5}
InvalidOperation:
Line |
33 | $y = $x[$_.Matches.Value]
| ~~~~~~~~~~~~~~~~~~~~~~~~~
| Index operation failed; the array index evaluated to null.
What about now:
Okay. At this stage I know I'm being silly. But the point I'm trying to make is: How can I retrieve the value I want when I'm done with the Powershell object?
You can use $x.{ $_.Matches.Value } to access the value.
$x = $pages[0] | Select-String -AllMatches -Pattern $regexPageNum | Select-Object { $_.Matches.Value }
$x.{ $_.Matches.Value } # This will print 5
ie, You would have to wrap the property name inside {} since the property name contains "."
Instead of this way, I would suggest you to create a calculated property using Select-Object which makes the code more readable.
$x = $pages[0] | Select-String -AllMatches -Pattern $regexPageNum | Select-Object #{Name = 'PageNumber'; Expression = {$_.Matches.Value}}
$x.PageNumber
#Access matches in case of single match
$x = "red blue yellow green" | select-string -Pattern 'blue'
$x.matches.value
#Output
blue
#Access matches in case of multi match
$x = "red blue yellow green blue" | select-string -Pattern 'blue' -AllMatches
$x.matches.value
#Output
blue
blue
When you use a scriptblock as a parameter to Select-Object the return value will contain a property whose name matches the source code of the script block...
PS> #{ "aaa" = "bbb" } | select-object { $_.aaa; <# xxx #> }
$_.aaa; <# xxx #>
-------------------
bbb
In this pathological case, if I want to access the property I can't use the name in the default "dotted" notation because it contains reserved characters, but you can access it if you quote the property name:
PS> $x = #{ "aaa" = "bbb" } | select-object { $_.aaa; <# xxx #> }
# note the leading and trailing spaces in the string because the
# the original scriptblock source contains spaces between the "{" and "}"
PS> $x.' $_.aaa; <# xxx #> '
bbb
In your case you'd do this:
PS> $x = $pages[0] | Select-String -AllMatches -Pattern $regexPageNum | Select-Object {$_.Matches.Value}
PS> $x.'$_.Matches.Value'
Other options work too:
$x = $pages[0] `
| Select-String -AllMatches -Pattern $regexPageNum `
| Select-Object {$_.Matches.Value}
# get the property whose name is contained in the $name variable
PS> $name = '$_.Matches.Value'
PS> $x.$name
5
# the scriptblock gets converted into a string, and then that string
# is used as a property name
PS> $x.{$_.Matches.Value}
5
# note the whitespace in both scriptblocks has to match *exactly* otherwise the property name won't be found
PS> $x.{ $_.Matches.Value }
ParentContainsErrorRecordException: The property ' $_.Matches.Value ' cannot be found on this object. Verify that the property exists.
but...
There's an easier way - if you pass a hashtable to Select-Object instead of a scriptblock you can specify the name of the property - e.g.
PS> $x = $pages[0] `
| Select-String -AllMatches -Pattern $regexPageNum `
| Select-Object #{ "l"="Count"; "e"={$_.Matches.Value} }
PS> $x
Count
-----
5
PS> $x.Count
5
References:
about_Calculated_Properties - Hashtable key definitions

Powershell script to sum only specific records from file(s)

I am trying to work with a directory full of files.
I want to find specific rows within the file,
from those rows, extract a numeric value
and them sum up all these values, for all values, in a directory.
It would look like this...
File1.txt
bread:123
ham:456
eggs:789
File2.txt
bread:999
mayo:789
eggs:123
and so on...
I want to find the row with eggs, extract the number, and sum these numbers together across files.
I found this script from other posts but it's only segements, I still have trouble understanding how to use and pipe/ variables /braces.
dir . -filter "*.txt" -Recurse -name | foreach{(GC $_).Count} | measure-object -sum
#?
Get-Content | Select-String -Pattern "eggs*"
#?
$record -split ":"
I want the script to say "eggs = 912" which would be 123 + 789 = 912
Here is a possible solution:
$pattern = 'eggs'
$sum = Get-ChildItem . -File -Recurse -Filter *.txt |
Get-Content |
Where-Object { $_ -match $pattern } |
ForEach-Object { ($_ -split ':')[1] } |
Measure-Object -Sum |
ForEach-Object Sum
"$pattern = $sum"
Output:
eggs = 912
Get-ChildItem finds all files recursively that match the filter
Get-Content reads each line of every file and passes that on in the pipeline
Where-Object includes only lines that match the given RegEx pattern
The ForEach-Object line splits the line at : and extracts the sub string, which is at array index [1].
Measure-Object accumulates all numbers (it converts strings to double, if necessary). Internally, it creates a variable in its begin block, accumulates the pipeline input to this variable in its process block and outputs the variable value in its end block.
The last ForEach-Object line is necessary because Measure-Object actually outputs an object with a Sum property, but we only want the value of that property, not the entire object. If you'd remove that line you'd have to write "$pattern = $($sum.Sum)" instead, to access the Sum property of the sum object.
You can treat the as csv files. import-csv doesn't take wildcards for the filename.
import-csv file1.txt,file2.txt -Delimiter : -Header item,amount |
where item -eq eggs | measure -sum amount
Count : 2
Average :
Sum : 912
Maximum :
Minimum :
StandardDeviation :
Property : amount

powershell find and replace text in each file with specific match and extension

I am trying to do two things. Firstly I am to remove all text after match, and secondly replace match line with new text.
In my example below I want to find the line List of animals and replace all lines following it in all documents *.txt with the text found in replaceWithThis.txt.
My first foreach below will remove everything that follows List of animals and my second foreach will replace List of animals and thus also adding new content following that line by content from replaceWithThis.txt.
replaceWithThis.txt contains:
List of animals
Cat 5
Lion 3
Bird 2
*.txt contains:
List of cities
London 2
New York 3
Beijing 6
List of car brands
Volvo 2
BMW 3
Audi 5
List of animals
Cat 1
Dog 3
Bird 7
Code:
$replaceWithThis = Get-Content c:\temp\replaceWithThis.txt -Raw
$allFiles = Get-ChildItem "c:\temp" -recurse | where {$_.extension -eq ".txt"}
$line = Get-Content c:\temp\*.txt | Select-String cLuxPlayer_SaveData | Select-Object -ExpandProperty Line
foreach ($file in $allFiles)
{
(Get-Content $file.PSPath) |
ForEach-object { $_.Substring(15, $_.lastIndexOf('List of animals')) } |
Set-Content $file.PSPath
}
foreach ($file in $allFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object { $_ -replace $line,$replaceWithThis } |
Set-Content $file.PSPath
}
Final result in all (*.txt) should be:
List of cities
London 2
New York 3
Beijing 6
List of car brands
Volvo 2
BMW 3
Audi 5
List of animals
Cat 5
Lion 3
Bird 2
Using Regular Expression, the below code should work:
$filesPath = 'c:\temp'
$replaceFile = 'c:\temp\replaceWithThis.txt'
$regexToFind = '(?sm)(List of animals(?:(?!List).)*)'
$replaceWithThis = (Get-Content -Path $replaceFile -Raw).Trim()
Get-ChildItem -Path $filesPath -Filter *.txt | ForEach-Object {
$content = $_ | Get-Content -Raw
if ($content -match $regexToFind) {
Write-Host "Replacing text in file '$($_.FullName)'"
$_ | Set-Content -Value ($content -replace $matches[1].Trim(), $replaceWithThis) -Force
}
}
Regex Details
( Match the regular expression below and capture its match into backreference number 1
List of animals Match the characters “List of animals” literally
(?: Match the regular expression below
(?! Assert that it is impossible to match the regex below starting at this position (negative lookahead)
List Match the characters “List” literally
)
. Match any single character
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)

Count number of comments over multiple files, including multi-line comments

I'm trying to write a script that counts all comments in multiple files, including both single line (//) and multi-line (/* */) comments and prints out the total. So, the following file would return 4
// Foo
var text = "hello world";
/*
Bar
*/
alert(text);
There's a requirement to include specific file types and exclude certain file types and folders, which I already have working in my code.
My current code is:
( gci -include *.cs,*.aspx,*.js,*.css,*.master,*.html -exclude *.designer.cs,jquery* -recurse `
| ? { $_.FullName -inotmatch '\\obj' } `
| ? { $_.FullName -inotmatch '\\packages' } `
| ? { $_.FullName -inotmatch '\\release' } `
| ? { $_.FullName -inotmatch '\\debug' } `
| ? { $_.FullName -inotmatch '\\plugin-.*' } `
| select-string "^\s*//" `
).Count
How do I change this to get multi-line comments as well?
UPDATE: My final solution (slightly more robust than what I was asking for) is as follows:
$CodeFiles = Get-ChildItem -include *.cs,*.aspx,*.js,*.css,*.master,*.html -exclude *.designer.cs,jquery* -recurse |
Where-Object { $_.FullName -notmatch '\\(obj|packages|release|debug|plugin-.*)\\' }
$TotalFiles = $CodeFiles.Count
$IndividualResults = #()
$CommentLines = ($CodeFiles | ForEach-Object{
#Get the comments via regex
$Comments = ([regex]::matches(
[IO.File]::ReadAllText($_.FullName),
'(?sm)^[ \t]*(//[^\n]*|/[*].*?[*]/)'
).Value -split '\r?\n') | Where-Object { $_.length -gt 0 }
#Get the total lines
$Total = ($_ | select-string .).Count
#Add to the results table
$IndividualResults += #{
File = $_.FullName | Resolve-Path -Relative;
Comments = $Comments.Count;
Code = ($Total - $Comments.Count)
Total = $Total
}
Write-Output $Comments
}).Count
$TotalLines = ($CodeFiles | select-string .).Count
$TotalResults = New-Object PSObject -Property #{
Files = $TotalFiles
Code = $TotalLines - $CommentLines
Comments = $CommentLines
Total = $TotalLines
}
Write-Output (Get-Location)
Write-Output $IndividualResults | % { new-object PSObject -Property $_} | Format-Table File,Code,Comments,Total
Write-Output $TotalResults | Format-Table Files,Code,Comments,Total
To be clear: Using string matching / regular expressions is not a fully robust way to detect comments in JavaScript / C# code, because there can be false positives (e.g., var s = "/* hi */";); for robust parsing you'd need a language parser.
If that is not a concern, and it is sufficient to detect comments (that start) on their own line, optionally preceded by whitespace, here's a concise solution (PSv3+):
(Get-ChildItem -include *.cs,*.aspx,*.js,*.css,*.master,*.html -exclude *.designer.cs,jquery* -recurse |
Where-Object { $_.FullName -notmatch '\\(obj|packages|release|debug|plugin-.*)' } |
ForEach-Object {
[regex]::matches(
[IO.File]::ReadAllText($_.FullName),
'(?sm)^[ \t]*(//[^\n]*|/[*].*?[*]/)'
).Value -split '\r?\n'
}
).Count
With the sample input, the ForEach-Object command yields 4.
Remove the ^[ \t]* part to match comments starting anywhere on a line.
The solution reads each input file as a single string with [IO.File]::ReadAllText() and then uses the [regex]::Matches() method to extract all (potentially line-spanning) comments.
Note: You could use Get-Content -Raw instead to read the file as a single string, but that is much slower, especially when processing multiple files.
The regex uses in-line options s and m ((?sm)) to respectively make . match newlines too and to make anchors ^ and $ match line-individually.
^[ \t]* matches any mix of spaces and tabs, if any, at the start of a line.
//[^\n]*$ matches a string that starts with // through the end of the line.
/[*].*?[*]/ matches a block comment across multiple lines; note the lazy quantifier, *?, which ensures that very next instance of the closing */ delimiter is matched.
The matched comments (.Value) are then split into individual lines (-split '\r?\n'), which are output.
The resulting lines across all files are then counted (.Count)
As for what you tried:
The fundamental problem with your approach is that Select-String with file-info object input (such as provided by Get-ChildItem) invariably processes the input files line by line.
While this could be remedied by calling Select-String inside a ForEach-Object script block in which you pass each file's content as a single string to Select-String, direct use of the underlying regex .NET types, as shown above, is more efficient.
An IMO better approach is to count net code lines by removing single/multi line comments.
For a start a script that handles single files and returns for your above sample.cs the result 5
((Get-Content sample.cs -raw) -replace "(?sm)^\s*\/\/.*?$" `
-replace "(?sm)\/\*.*?\*\/.*`n" | Measure-Object -Line).Lines
EDIT: without removing empty lines, build the difference from total lines
## Q:\Test\2018\10\31\SO_53092258.ps1
$Data = Get-ChildItem *.cs | ForEach-Object {
$Content = Get-Content $_.FullName -Raw
$TotalLines = (Measure-Object -Input $Content -Line).Lines
$CodeLines = ($Content -replace "(?sm)^\s*\/\/.*?$" `
-replace "(?sm)\/\*.*?\*\/.*`n" | Measure-Object -Line).Lines
$Comments = $TotalLines - $CodeLines
[PSCustomObject]#{
File = $_.FullName
Lines = $TotalLines
Comments= $Comments
}
}
$Data
"="*40
"TotalLines={0} TotalCommentLines={1}" -f (
$Data | Measure-Object -Property Lines,Comments -Sum).Sum
Sample output:
> Q:\Test\2018\10\31\SO_53092258.ps1
File Lines Comments
---- ----- --------
Q:\Test\2018\10\31\example.cs 10 5
Q:\Test\2018\10\31\sample.cs 9 4
============================================
TotalLines=19 TotalCommentLines=9

Count tabs per line and return the lines with too many tabs

Looking for a PowerShell script that looks in a text file for rows that have too many (or too few) tabs.
I found this PowerShell script that does exactly what I want (almost).
This counts the number of tabs per row:
Get-Content test.txt | ForEach-Object {
($_ | Select-String `t -all).matches | Measure-Object | Select-Object count
}
Can someone extend/modify/re-write this to return only the rows (with row numbers) that have more than, or less than, X number of tabs per row?
Don't use Get-Content before piping to Select-String, you'll lose contextual information about each line.
Instead, use the -Path parameter with Select-String:
$Tabs = Select-String -Path .\test.txt -Pattern "`t" -AllMatches
$Tabs |Select-Object LineNumber,Line,#{Name='TabCount';Expression={ $_.Matches.Count }}
To return only the ones where the number of tabs is greater than $x, use Where-Object:
$x = 3
$Tabs |Where-Object { $_.TabCount -ge $x} | Select-Object -ExpandProperty Line
If you just want a quick overview of the distribution, you could also use Group-Object:
Get-Content .\test.txt | Group-Object { "{0} tabs" -f [regex]::Matches($_,"`t").Count }
Lots of ways to do this. Get-Content works just fine for me and we create a custom object that you can then filter as desired.
Get-Content test.txt | ForEach-Object{
New-Object PSObject -Property #{
Line = $_
LineNumber = $_.ReadCount
NumberofTabs = [regex]::matches($_,"`t").count
}
}
Use the .net regex method to count the tabs returned and populate a value based on the result.
NumberofTabs Number Line
------------ ------ ----
8 1 ;lkjasfdsa
8 2 asdfasdf
4 3 asdfasdfasdfa
2 4 fasdfjasdlfjas;l
Now you can use PowerShell to filter as you see fit.
} | Where-Object { $_.NumberofTabs -ne 4}
So if 4 was the perfect number then line 3 would be ommited from the results.