powershell find and replace text in each file with specific match and extension - powershell

I am trying to do two things. Firstly I am to remove all text after match, and secondly replace match line with new text.
In my example below I want to find the line List of animals and replace all lines following it in all documents *.txt with the text found in replaceWithThis.txt.
My first foreach below will remove everything that follows List of animals and my second foreach will replace List of animals and thus also adding new content following that line by content from replaceWithThis.txt.
replaceWithThis.txt contains:
List of animals
Cat 5
Lion 3
Bird 2
*.txt contains:
List of cities
London 2
New York 3
Beijing 6
List of car brands
Volvo 2
BMW 3
Audi 5
List of animals
Cat 1
Dog 3
Bird 7
Code:
$replaceWithThis = Get-Content c:\temp\replaceWithThis.txt -Raw
$allFiles = Get-ChildItem "c:\temp" -recurse | where {$_.extension -eq ".txt"}
$line = Get-Content c:\temp\*.txt | Select-String cLuxPlayer_SaveData | Select-Object -ExpandProperty Line
foreach ($file in $allFiles)
{
(Get-Content $file.PSPath) |
ForEach-object { $_.Substring(15, $_.lastIndexOf('List of animals')) } |
Set-Content $file.PSPath
}
foreach ($file in $allFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object { $_ -replace $line,$replaceWithThis } |
Set-Content $file.PSPath
}
Final result in all (*.txt) should be:
List of cities
London 2
New York 3
Beijing 6
List of car brands
Volvo 2
BMW 3
Audi 5
List of animals
Cat 5
Lion 3
Bird 2

Using Regular Expression, the below code should work:
$filesPath = 'c:\temp'
$replaceFile = 'c:\temp\replaceWithThis.txt'
$regexToFind = '(?sm)(List of animals(?:(?!List).)*)'
$replaceWithThis = (Get-Content -Path $replaceFile -Raw).Trim()
Get-ChildItem -Path $filesPath -Filter *.txt | ForEach-Object {
$content = $_ | Get-Content -Raw
if ($content -match $regexToFind) {
Write-Host "Replacing text in file '$($_.FullName)'"
$_ | Set-Content -Value ($content -replace $matches[1].Trim(), $replaceWithThis) -Force
}
}
Regex Details
( Match the regular expression below and capture its match into backreference number 1
List of animals Match the characters “List of animals” literally
(?: Match the regular expression below
(?! Assert that it is impossible to match the regex below starting at this position (negative lookahead)
List Match the characters “List” literally
)
. Match any single character
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)

Related

Count number of comments over multiple files, including multi-line comments

I'm trying to write a script that counts all comments in multiple files, including both single line (//) and multi-line (/* */) comments and prints out the total. So, the following file would return 4
// Foo
var text = "hello world";
/*
Bar
*/
alert(text);
There's a requirement to include specific file types and exclude certain file types and folders, which I already have working in my code.
My current code is:
( gci -include *.cs,*.aspx,*.js,*.css,*.master,*.html -exclude *.designer.cs,jquery* -recurse `
| ? { $_.FullName -inotmatch '\\obj' } `
| ? { $_.FullName -inotmatch '\\packages' } `
| ? { $_.FullName -inotmatch '\\release' } `
| ? { $_.FullName -inotmatch '\\debug' } `
| ? { $_.FullName -inotmatch '\\plugin-.*' } `
| select-string "^\s*//" `
).Count
How do I change this to get multi-line comments as well?
UPDATE: My final solution (slightly more robust than what I was asking for) is as follows:
$CodeFiles = Get-ChildItem -include *.cs,*.aspx,*.js,*.css,*.master,*.html -exclude *.designer.cs,jquery* -recurse |
Where-Object { $_.FullName -notmatch '\\(obj|packages|release|debug|plugin-.*)\\' }
$TotalFiles = $CodeFiles.Count
$IndividualResults = #()
$CommentLines = ($CodeFiles | ForEach-Object{
#Get the comments via regex
$Comments = ([regex]::matches(
[IO.File]::ReadAllText($_.FullName),
'(?sm)^[ \t]*(//[^\n]*|/[*].*?[*]/)'
).Value -split '\r?\n') | Where-Object { $_.length -gt 0 }
#Get the total lines
$Total = ($_ | select-string .).Count
#Add to the results table
$IndividualResults += #{
File = $_.FullName | Resolve-Path -Relative;
Comments = $Comments.Count;
Code = ($Total - $Comments.Count)
Total = $Total
}
Write-Output $Comments
}).Count
$TotalLines = ($CodeFiles | select-string .).Count
$TotalResults = New-Object PSObject -Property #{
Files = $TotalFiles
Code = $TotalLines - $CommentLines
Comments = $CommentLines
Total = $TotalLines
}
Write-Output (Get-Location)
Write-Output $IndividualResults | % { new-object PSObject -Property $_} | Format-Table File,Code,Comments,Total
Write-Output $TotalResults | Format-Table Files,Code,Comments,Total
To be clear: Using string matching / regular expressions is not a fully robust way to detect comments in JavaScript / C# code, because there can be false positives (e.g., var s = "/* hi */";); for robust parsing you'd need a language parser.
If that is not a concern, and it is sufficient to detect comments (that start) on their own line, optionally preceded by whitespace, here's a concise solution (PSv3+):
(Get-ChildItem -include *.cs,*.aspx,*.js,*.css,*.master,*.html -exclude *.designer.cs,jquery* -recurse |
Where-Object { $_.FullName -notmatch '\\(obj|packages|release|debug|plugin-.*)' } |
ForEach-Object {
[regex]::matches(
[IO.File]::ReadAllText($_.FullName),
'(?sm)^[ \t]*(//[^\n]*|/[*].*?[*]/)'
).Value -split '\r?\n'
}
).Count
With the sample input, the ForEach-Object command yields 4.
Remove the ^[ \t]* part to match comments starting anywhere on a line.
The solution reads each input file as a single string with [IO.File]::ReadAllText() and then uses the [regex]::Matches() method to extract all (potentially line-spanning) comments.
Note: You could use Get-Content -Raw instead to read the file as a single string, but that is much slower, especially when processing multiple files.
The regex uses in-line options s and m ((?sm)) to respectively make . match newlines too and to make anchors ^ and $ match line-individually.
^[ \t]* matches any mix of spaces and tabs, if any, at the start of a line.
//[^\n]*$ matches a string that starts with // through the end of the line.
/[*].*?[*]/ matches a block comment across multiple lines; note the lazy quantifier, *?, which ensures that very next instance of the closing */ delimiter is matched.
The matched comments (.Value) are then split into individual lines (-split '\r?\n'), which are output.
The resulting lines across all files are then counted (.Count)
As for what you tried:
The fundamental problem with your approach is that Select-String with file-info object input (such as provided by Get-ChildItem) invariably processes the input files line by line.
While this could be remedied by calling Select-String inside a ForEach-Object script block in which you pass each file's content as a single string to Select-String, direct use of the underlying regex .NET types, as shown above, is more efficient.
An IMO better approach is to count net code lines by removing single/multi line comments.
For a start a script that handles single files and returns for your above sample.cs the result 5
((Get-Content sample.cs -raw) -replace "(?sm)^\s*\/\/.*?$" `
-replace "(?sm)\/\*.*?\*\/.*`n" | Measure-Object -Line).Lines
EDIT: without removing empty lines, build the difference from total lines
## Q:\Test\2018\10\31\SO_53092258.ps1
$Data = Get-ChildItem *.cs | ForEach-Object {
$Content = Get-Content $_.FullName -Raw
$TotalLines = (Measure-Object -Input $Content -Line).Lines
$CodeLines = ($Content -replace "(?sm)^\s*\/\/.*?$" `
-replace "(?sm)\/\*.*?\*\/.*`n" | Measure-Object -Line).Lines
$Comments = $TotalLines - $CodeLines
[PSCustomObject]#{
File = $_.FullName
Lines = $TotalLines
Comments= $Comments
}
}
$Data
"="*40
"TotalLines={0} TotalCommentLines={1}" -f (
$Data | Measure-Object -Property Lines,Comments -Sum).Sum
Sample output:
> Q:\Test\2018\10\31\SO_53092258.ps1
File Lines Comments
---- ----- --------
Q:\Test\2018\10\31\example.cs 10 5
Q:\Test\2018\10\31\sample.cs 9 4
============================================
TotalLines=19 TotalCommentLines=9

Parse text file with powershell, print out lines starting with 2 different strings

I have a text file containing repeating patterns of text (a STIG review document)
Sample:
Group ID (Vulid):  V-71989
Group Title:  SRG-OS-000445-GPOS-00199
.
Vulnerability Discussion:  ...
Check Content:...
<hash symbol> some command
.
I want to output the line beginning with "Group ID (Vulid)"
AND the line beginning with "#" in the order present in the file.
I have tried:
Get-Content C:\in-file.txt | (Where-Object {$_ -match 'Group ID'}) | (Where-Object {$_ -match '#'}) | Set-Content C:\out.txt
but it barfs on the "Or".
What Bill_Stewart is trying to say is it should be this:
Get-Content C:\in-file.txt |
Where-Object {$_ -match 'Group ID' -or $_ -match '#'} |
Set-Content C:\out.txt
Multi-line is just for readability; you can have it as one line in your code.

Is there a way to use Select-String to print the line number and the matched value (not the line)?

I am using powershell to search several files for a specific match to a regular expression. Because it is a regular expression, I wan't to only see what I have programmed my regex to accept, and the line number at which it is matched.
I then want to take the matched value and the line number and create an object to output to an excel file.
I can get each item in individual select string statements, but then they won't be matched up with each other
Select-String -Path $pathToFile -Pattern '(?<={\n\s*Box\s=\s")14\d{3}(?=",)' |
Select LineNumber, Matches.Value
#Will only print out the lineNumber
Select-String -Path $pathToFile -Pattern '(?<={\n\s*Box\s=\s")14\d{3}(?=",)' |
Foreach {$_.matches} | Select value
#Will only print matched value and can't print linenumber
Can anyone help me get both the line number and the matched value?
Edit: Just to clarify what I am doing
$files = Get-ChildItem $directory -Include *.vb,*.cs -Recurse
$colMatchedFiles = #()
foreach ($file in $files) {
$fileContent = Select-String -Path $file -Pattern '(?<={\n\s*Box\s=\s")14\d{3}(?=",)' |
Select-Object LineNumber, #{Name="Match"; Expression={$_.Matches[0].Groups[1].Value}}
write-host $fileContent #just for checking if there is anything
}
This still does not get anything, it just outputs a bunch of blank lines
Edit: What I am expecting to happen is for this script to search the content of all the files in the directory and find the lines that match the regular expression. Below is what I would expect for output for each file in the loop
LineNumber Match
---------- -----
324 15
582 118
603 139
... ...
File match sample:
{
Box = "5015",
Description = "test box 1"
}....
{
Box = "5118",
Description = "test box 2"
}...
{
Box = "5139",
Description = "test box 3"
}...
Example 1
Select the LineNumber and group value for each match. Example:
$sampleData = #'
prefix 1 A B suffix 1
prefix 2 A B suffix 2
'# -split "`n"
$sampleData | Select-String '(A B)' |
Select-Object LineNumber,
#{Name="Match"; Expression={$_.Matches[0].Groups[1].Value}}
Example 2
Search *.vb and *.cs for files containing the string Box = "<n>", where <n> is some number, and output the filename, line number of the file, and the number on the box = lines. Sample code:
Get-ChildItem $pathToFiles -Include *.cs,*.vb -Recurse |
Select-String 'box = "(\d+)"' |
Select-Object Path,
LineNumber,
#{Name="Match"; Expression={$_.Matches[0].Groups[1].Value -as [Int]}}
This returns output like the following:
Path LineNumber Match
---- ---------- -----
C:\Temp\test1.cs 2 5715
C:\Temp\test1.cs 6 5718
C:\Temp\test1.cs 10 5739
C:\Temp\test1.vb 2 5015
C:\Temp\test1.vb 6 5118
C:\Temp\test1.vb 10 5139
Example 3
Now that we know that we want the line before the match to contain {, we can use the -Context parameter with Select-String. Example:
Get-ChildItem $pathToFiles -Include *.cs,*.vb -Recurse |
Select-String 'box = "(\d+)"' -Context 1 | ForEach-Object {
# Line prior to match must contain '{' character
if ( $_.Context.DisplayPreContext[0] -like "*{*" ) {
[PSCustomObject] #{
Path = $_.Path
LineNumber = $_.LineNumber
Match = $_.Matches[0].Groups[1].Value
}
}
}

Using a variable within a PowerShell Select-String

I have this one section of my PowerShell script that I'm currently stuck on. Basically, I have two files that I want to do a comparison on via select-string...In detail
For each item in FileA.txt I want to do a select-string to FileB.txt to discover if it exist. If the line item in FileA.txt doesn't exist in FileB.txt, then print the FileA.txt line item to the screen.
This is what the text files looks like..more or less
FileA.txt
1
2
3
4
6
FileB.txt
6
7
8
9
10
Desired output would be the following:
1
2
3
4
This is what my PS code looks like now. My thought process was that I could use the variable within the select-string but its not working out for me :(
$IPs = Get-Content "C:\\FileA.txt"
Get-Content C:\FileB.txt | Select-String -InputObject $IPs
Could someone please help me out and point out what I am doing wrong.
Based on your limited sample data, here is an example of how you could do this :
"1 2 3 4 6" > "fileA.txt"
"6 7 8 9 10" > "fileB.txt"
$arrayA = (Get-Content "fileA.txt").Split(" ")
$arrayB = (Get-Content "fileB.txt").Split(" ")
$arrayResult = #()
foreach($valueA in $arrayA) {
if($arrayB -notcontains $valueA) {
$arrayResult += $valueA
}
}
$arrayResult -join " "
Now I believe the input files will be quite different eventually
EDIT :
Using line breaks :
"1
2
3
4
6" > "fileA.txt"
"6
7
8
9
10" > "fileB.txt"
$arrayA = Get-Content "fileA.txt"
$arrayB = Get-Content "fileB.txt"
$arrayResult = #()
foreach($valueA in $arrayA) {
if($arrayB -notcontains $valueA) {
$arrayResult += $valueA
}
}
$arrayResult -join "`n"
NB : the 2 scripts begin by filling the needed files, I guess you won't need to do it
In this specific example, Compare-Object would probably be a better choice. It is designed to find the differences between two lists.
You would use something like:
Compare-Object -ReferenceObject $(gc .\FileA.txt) -DifferenceObject $(gc .\FileB.txt) | where { $_.SideIndicator -eq '<=' } | select -expand InputObject
However, you could also do this with select-string:
gc .\FileA.txt | select-string -Pattern $(gc .\FileB.txt) -NotMatch
which just finds the lines of FileA that don't match the lines of FileB, however the lines of FileB are interpreted as regular expressions, which probably isn't appropriate for IP addresses since '.' is a wildcard.

Count tabs per line and return the lines with too many tabs

Looking for a PowerShell script that looks in a text file for rows that have too many (or too few) tabs.
I found this PowerShell script that does exactly what I want (almost).
This counts the number of tabs per row:
Get-Content test.txt | ForEach-Object {
($_ | Select-String `t -all).matches | Measure-Object | Select-Object count
}
Can someone extend/modify/re-write this to return only the rows (with row numbers) that have more than, or less than, X number of tabs per row?
Don't use Get-Content before piping to Select-String, you'll lose contextual information about each line.
Instead, use the -Path parameter with Select-String:
$Tabs = Select-String -Path .\test.txt -Pattern "`t" -AllMatches
$Tabs |Select-Object LineNumber,Line,#{Name='TabCount';Expression={ $_.Matches.Count }}
To return only the ones where the number of tabs is greater than $x, use Where-Object:
$x = 3
$Tabs |Where-Object { $_.TabCount -ge $x} | Select-Object -ExpandProperty Line
If you just want a quick overview of the distribution, you could also use Group-Object:
Get-Content .\test.txt | Group-Object { "{0} tabs" -f [regex]::Matches($_,"`t").Count }
Lots of ways to do this. Get-Content works just fine for me and we create a custom object that you can then filter as desired.
Get-Content test.txt | ForEach-Object{
New-Object PSObject -Property #{
Line = $_
LineNumber = $_.ReadCount
NumberofTabs = [regex]::matches($_,"`t").count
}
}
Use the .net regex method to count the tabs returned and populate a value based on the result.
NumberofTabs Number Line
------------ ------ ----
8 1 ;lkjasfdsa
8 2 asdfasdf
4 3 asdfasdfasdfa
2 4 fasdfjasdlfjas;l
Now you can use PowerShell to filter as you see fit.
} | Where-Object { $_.NumberofTabs -ne 4}
So if 4 was the perfect number then line 3 would be ommited from the results.