Getting rid of unwanted html in file - powershell

I have a file that has the following below, I am trying to remove everything from <!-- to -->
<!--<br>
/* Font Definitions */
-->
Only keep this part

Don't use a regex. HTML isn't a regular language, so it can't be properly parsed with a regex. It will succeed most of the time, but other times will fail. Spectacularly.
I recommend cracking open the file, and reading it a character at at time, looking for the characters <, !, -, followed by -. Then, continue reading until you find -, -, !, followed by >.
$chars = [IO.File]::ReadAllText( $path ).ToCharArray()
$newFileContent = New-Object 'Text.StringBuilder'
for( $i = 0; $i -lt $chars.Length; ++$i )
{
if( $inComment )
{
if( $chars[$i] -eq '-' -and $chars[$i+1] -eq '-' -and $chars[$i+2] -eq '!' -and $chars[$i+3] -eq '>' )
{
$inComment = $false
$i += 4
}
continue
}
if( $chars[$i] -eq '<' -and $chars[$i+1] -eq '!' -and $chars[$i+2] -eq '-' -and $chars[$i+3] -eq '-' )
{
$inComment = $true
$i += 4
continue
}
$newFileContent.Append( $chars[$i] )
}
$newFileContent.ToString() | Set-Content -Path $path

Regular expressions to the rescue again -
#'
<!--<br>
/* Font Definitions */
-->
Only keep this part
'# -replace '(?s)<!--(.+?)-->', ''
(?s) makes dot match new lines :)

Related

Powershell overwriting file contents with match instead of editing single line

I have a text file that contains a string I want to modify.
Example text file contents:
abc=1
def=2
ghi=3
If I run this code:
$file = "c:\test.txt"
$MinX = 100
$MinY = 100
$a = (Get-Content $file) | %{
if($_ -match "def=(\d*)"){
if($Matches[1] -gt $MinX){$_ -replace "$($Matches[1])","$($MinX)" }
}
}
$a
The result is:
def=100
If I omit the greater-than check like so:
$a = (Get-Content $file) | %{
if($_ -match "def=(\d*)"){
$_ -replace "$($Matches[1])","$($MinX)"
}
}
$a
The result is correct:
abc=1
def=100
ghi=3
I don't understand how a simple integer comparison before doing the replace could screw things up so badly, can anyone advise what I'm missing?
The comparison operator -gt will never get you a value of $true because you need to
cast the $matches[1] string value to int first so it compares two integer numbers
2 is never greater than 100.. Change the operator to -lt instead.
Your code outputs only one line, because you forgot to also output unchanged lines that do not match the regex
$file = 'c:\test.txt'
$MinX = 100
$MinY = 100
$a = (Get-Content $file) | ForEach-Object {
if ($_ -match '^def=(\d+)'){
if([int]$matches[1] -lt $MinX){ $_ -replace $matches[1],$MinX }
}
else {
$_
}
}
$a
Or use switch (is also faster than using Get-Content):
$file = 'c:\test.txt'
$MinX = 100
$MinY = 100
$a = switch -Regex -File $file {
'^def=(\d+)' {
if([int]$matches[1] -lt $MinX){ $_ -replace $matches[1],$MinX }
}
default { $_ }
}
$a
Output:
abc=1
def=100
ghi=3
That's because the expression ($Matches[1] -gt $MinX) is a string comparison. In Powershell, the left-hand side of a comparison dictates the comparison type and since that is of type [string], Powershell has to cast/convert the right-hand side of the expression to [string] also. You expression, therefore, is evaluated as ([string]$Matches[1] -gt [string]$MinX).

How do I loop through a string read from a file against a fixed set of characters individually and if they match print the string in powershell

I currently have a foreach loop, that gets content from a small dictionary file (only strings over 3 characters). I am looking to compare each character in the $line against my other characters, in this case "b" "i" "n" "g" "o" so that if all the characters in $line are in bingo, then it prints the word. If not it loops to the next word.
So far I have:
foreach($line in Get-Content Desktop/dict.txt | Sort-Object Length, { $_ })
The bit I can't get (not too familiar with powershell) is this:
if($line.length -gt 3){
if( i in $line == 'b')
if( i in $line == 'i')
if( i in $line == 'n')
if( i in $line == 'g')
if( i in $line == 'o')
write-output $line
}
}
If I understood correctly, if you want to check if $line is contained in bingo you could use the -match for case insensitive and -cmatch for case sensitive operators. See Comparison Operators.
For example:
PS /> 'bingo' -match 'ing'
True
PS /> 'bingo' -match 'bin'
True
PS /> 'bingo' -match 'ngo'
True
The code could look like this:
foreach($line in Get-Content Desktop/dict.txt | Sort-Object Length, { $_ })
{
if($line.length -gt 3 -and 'bingo' -match $line)
{
$line
# you can add break here to stop this loop if the word is found
}
}
Edit
If you want to check if 3 or more characters in bingo (in any order) are contained in $line, there are many ways to do this, this is the approach I would take:
# Insert magic word here
$magicWord = 'bingo'.ToCharArray() -join '|'
foreach($line in Get-Content Desktop/dict.txt | Sort-Object Length, { $_ })
{
# Remove [Text.RegularExpressions.RegexOptions]::IgnoreCase if you want it to be Case Sensitive
$check = [regex]::Matches($line,$magicWord,[Text.RegularExpressions.RegexOptions]::IgnoreCase)
# If 3 or more unique characters were matched
if(($check.Value | Select-Object -Unique).count -ge 3)
{
'Line is: {0} || Characters Matched: {1}' -f $line,-join $check.Value
}
}
Demo
Given the following words:
$test = 'ngob','ibgo','gn','foo','var','boing','ingob','oubingo','asdBINGasdO!'
It would yield:
Line is: ngob || Characters Matched: ngob
Line is: ibgo || Characters Matched: ibgo
Line is: boing || Characters Matched: boing
Line is: ingob || Characters Matched: ingob
Line is: oubingo || Characters Matched: obingo
Line is: asdBINGasdO! || Characters Matched: BINGO
So you want to get back any words that are the same length and have the same characters no matter the order?
$dict = #(
'bingo'
'rambo'
'big'
'gobin'
'bee'
'ebe'
'been'
'ginbo'
)
$word = 'bingo'
$dict |
Where-Object { $_.length -eq $word.Length } |
ForEach-Object {
$dictwordLetters = [System.Collections.Generic.List[char]]::new($_.ToCharArray())
$word.ToCharArray() | ForEach-Object {
$dictwordLetters.Remove($_) | Out-Null
}
if (-not $dictwordLetters.Count) {
$_
}
}
The following will be the output
bingo
gobin
ginbo
By taking parts of both answers I was able to get the result I was after. As I am new to this, not sure how to thank #martin and #santiago for their work.
This was the code that was put together, which was pretty much taking the dictionary file and then rather than a fixed string size made it greater than 3:
$dict = #(Get-Content Desktop/dict.txt | Sort-Object Length, { $_ })
$word = 'bingo'
$dict |
Where-Object { $_.length -gt 2 } |
ForEach-Object {
$dictwordLetters = [System.Collections.Generic.List[char]]::new($_.ToCharArray())
$word.ToCharArray() | ForEach-Object {
$dictwordLetters.Remove($_) | Out-Null
}
if (-not $dictwordLetters.Count) {
$_
}
}
Your assistance was greatly appreciated.
Here's my two cents:
$dict = 'apple', 'brown', 'green', 'cake', 'bin', 'pear', 'big', 'milk', 'bio', 'bong', 'bingo', 'bodings', 'gibson'
# the search term as string
$term = 'bingo'
# merge the unique characters into a regex like '[bingo]+'
$chars = '[{0}]+' -f (($term.ToCharArray() | Select-Object -Unique) -join '')
# loop over the array (lines in the text file)
$dict | ForEach-Object {
# get all matched characters, uniqify and join if there are more matches.
$found = (($_ | Select-String -Pattern $chars -AllMatches).Matches.Value | Select-Object -Unique ) -join '' | Where-Object { $_.Length -ge 3 }
if ($found) {
# outputting an object better explains what is matched in which line
[PsCustomObject]#{
Line = $_
CharactersMatched = $found
}
# of course, you can also simply output the found matching characters
# $found
}
}
Output:
Line CharactersMatched
---- -----------------
brown bon
bin bin
big big
bio bio
bong bong
bingo bingo
bodings boing
gibson gibon
The previous answers all seem overly complicated. If you are trying to match strings then that sounds like a problem that requires a regular expression, and if that is the case then Select-String would be a better option than Get-Content. Below is an example, I am not sure if it is exactly right for your needs but should point you in the right direction:
Select-String 'Desktop/dict.txt' -pattern '^[bingo]{3,}$'

Parse a list line by line, create a new list in Powershell

I need to read in a file that contains lines of source/destination IPs and ports as well as a tag. I'm using Get-Content:
Get-Content $logFile -ReadCount 1 | % {
} | sort | get-unique | Out-File "C:\Log\logout.txt"
This is an example of the input file:
|10.0.0.99|345|195.168.4.82|58164|spam|
|10.0.0.99|345|195.168.4.82|58164|robot|
|10.0.0.99|231|195.168.4.82|58162|spam|
|195.168.4.82|58162|10.0.0.99|231|robot|
|10.0.0.99|345|195.168.4.82|58168|spam|
|10.0.0.99|345|195.168.4.82|58169|spam|
What I need to do is output a new list, but if the same source/destination IPs/ports are both 'spam' and 'robot' I just need to output that line as 'robot' (lines 1 and 2 above).
I need to do the same if the reverse direction of an existing connection is either 'spam' or 'robot', I just need one or the other and it would be 'robot' (lines 3 and 4 above). There will be plenty of 'spam' lines without a duplicate or reverse connection (the last couple lines above), they need to just stay the same.
This is what i've been using to create the reverse direction of the connection, but I haven't been able to figure out how to properly create the new list:
$reverse = '|' + ($_.Split("|")[3,4,1,2,5] -join '|') + '|'
Output of the above would be:
|10.0.0.99|345|195.168.4.82|58164|robot|
|195.168.4.82|58162|10.0.0.99|231|robot|
|10.0.0.99|345|195.168.4.82|58168|spam|
|10.0.0.99|345|195.168.4.82|58169|spam|
(except that second line didn't have to be the reversed direction)
Thanks for any help!
Since both direct and reverse connections are checked and their line order may not be sequential, I would use a hashtable to store the type of both directions and do everything algorithmically:
$checkPoints = #{}
$output = [ordered]#{}
$reader = [IO.StreamReader]'R:\1.txt'
while (!$reader.EndOfStream) {
$line = $reader.ReadLine()
$s = $line.split('|')
$direct = [string]::Join('|', $s[1..4])
$reverse = [string]::Join('|', ($s[3,4,1,2]))
$type = $s[5]
$known = $checkPoints[$direct]
if (!$known -or ($type -eq 'robot' -and $known -eq 'spam')) {
$checkPoints[$direct] = $checkPoints[$reverse] = $type
$output[$direct] = $line
$output.Remove($reverse)
} elseif ($type -eq 'spam' -and $known -eq 'robot') {
$output.Remove($reverse)
}
}
$reader.Close()
Set-Content r:\2.txt -Encoding utf8 -value #($output.Values)

I want to check if an element exist in an array

I want to check if a element exist in an array.
$data = "100400296 676100 582"
$i = "18320-my-turn-582"
if ($data -like $i) { Write-Host "Exist" }
else { Write-Host "Didn't exist" }
This example doesn't work like I want it. $i contains 582, so I want it to be Exist in result.
Your string "18320-my-turn-582" doesn't exist in $data, even though both strings contain the substring 582.
PowerShell treats your strings as a whole, and 18320-my-turn-582 is not present in 100400296 676100 582. To work around this you can:
Use Regex:
$i -match '\d+$'
$data -match $Matches[0]
Split the $i at hyphens so you will have:
$i = $i -split '-'
# turns $i into a array with the elements:
# 18320
# my
# turn
# 582
$data -match $i[-1]
# Output: 100400296 676100 582
Check out Get-Help about_Comparison_Operators to understand the differences between -Contains, -Match and -Like operators.

Powershell - How to display next $line in a foreach loop

I am currently parsing strings from .cpp files and need a way to display string blocks of multiple lines using the _T syntax. To exclude one line _T strings, I included a -notmatch ";" parameter to exclude them. This also excludes the last line of the string block, which I need. So I need to display the next string, so that the last string block with ";" is included.
I tried $foreach.moveNext() | out-file C:/T_Strings.txt -append but no luck.
Any help would be greatly appreciated. :)
foreach ($line in $allLines)
{
$lineNumber++
if ($line -match "^([0-9\s\._\)\(]+$_=<>%#);" -or $line -like "*#*" -or $line -like "*\\*" -or $line -like "*//*" -or $line -like "*.dll* *.exe*")
{
continue
}
if ($line -notlike "*;*" -and $line -match "_T\(\""" ) # Multiple line strings
{
$line | out-file C:/T_Strings.txt -append
$foreach.moveNext() | out-file C:/T_Strings.txt -append
}
In your sample, $foreach isn't a variable, so you can't call a method on it. If you want an iterator, you'll need to create one:
$iter = $allLines.GetEnumerator()
do
{
$iter.MoveNext()
$line = $iter.Current
if( -not $line )
{
break
}
} while( $line )
I would recommend you don't use regular expressions, though. Parse the C++ files instead. Here's the simplest thing I could think of to parse out all _T strings. It doesn't handle:
commented out _T strings
a ") in the _T string
a _T string at the end of a file.
You'll have to add those checks yourself. If you only want multi-line _T strings, you'll have to filter out single line strings, too.
$inString = $false
$strings = #()
$currentString = $null
$file = $allLines -join "`n"
$chars = $file.ToCharArray()
for( $idx = 0; $idx < $chars.Length; ++$idx )
{
$currChar = $chars[$idx]
$nextChar = $chars[$idx + 1]
$thirdChar = $chars[$idx + 2]
$fourthChar = $chars[$idx + 3]
# See if the current character is the start of a new _T token
if( -not $inString -and $currChar -eq '_' -and $nextChar -eq 'T' -and $thirdChar -eq '(' -and $fourthChar -eq '"' )
{
$idx += 3
$inString = $true
continue
}
if( $inString )
{
if( $currChar -eq '"' -and $nextChar -eq ')' )
{
$inString = $false
if( $currentString )
{
$strings += $currentString
}
$currentString = $null
}
else
{
$currentString += $currChar
}
}
}
Figured out the syntax to do this:
$foreach.movenext()
$foreach.current | out-file C:/T_Strings.txt -append
You need to move to the next, then pipe the current foreach value.