I have a text file with a large number of log messages.
I want to extract the messages between two string patterns. I want the extracted message to appear as it is in the text file.
I tried the following methods. It works, but doesn't support Get-Content's -Wait and -Tail options. Also, the extracted results are displayed in one line, but not like the text file. Inputs are welcome :-)
Sample Code
function GetTextBetweenTwoStrings($startPattern, $endPattern, $filePath){
# Get content from the input file
$fileContent = Get-Content $filePath
# Regular expression (Regex) of the given start and end patterns
$pattern = "$startPattern(.*?)$endPattern"
# Perform the Regex opperation
$result = [regex]::Match($fileContent,$pattern).Value
# Finally return the result to the caller
return $result
}
# Clear the screen
Clear-Host
$input = "THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
# Call the function
GetTextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $input
Improved script based on Theo's answer.
The following points need to be improved:
The beginning and end of the output is somehow trimmed despite I adjusted the buffer size in the script.
How to wrap each matched result into START and END string?
Still I could not figure out how to use the -Wait and -Tail options
Updated Script
# Clear the screen
Clear-Host
# Adjust the buffer size of the window
$bw = 10000
$bh = 300000
if ($host.name -eq 'ConsoleHost') # or -notmatch 'ISE'
{
[console]::bufferwidth = $bw
[console]::bufferheight = $bh
}
else
{
$pshost = get-host
$pswindow = $pshost.ui.rawui
$newsize = $pswindow.buffersize
$newsize.height = $bh
$newsize.width = $bw
$pswindow.buffersize = $newsize
}
function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
# Get content from the input file
$fileContent = Get-Content -Path $filePath -Raw
# Regular expression (Regex) of the given start and end patterns
$pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
# Perform the Regex operation and output
[regex]::Match($fileContent,$pattern).Groups[1].Value
}
# Input file path
$inputFile = "THE-LOG-FILE.log"
# The patterns
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
You need to perform streaming processing of your Get-Content call, in a pipeline, such as with ForEach-Object, if you want to process lines as they're being read.
This is a must if you're using Get-Content -Wait, because such a call doesn't terminate by itself (it keeps waiting for new lines to be added to the file, indefinitely), but inside a pipeline its output can be processed as it is being received, even before the command terminates.
You're trying to match across multiple lines, which with Get-Content output would only work if you used the -Raw switch - by default, Get-Content reads its input file(s) line by line.
However, -Raw is incompatible with -Wait.
Therefore, you must stick with line-by-line processing, which requires that you match the start and end patterns separately, and keep track of when you're processing lines between those two patterns.
Here's a proof of concept, but note the following:
-Tail 100 is hard-coded - adjust as needed or make it another parameter.
The use of -Wait means that the function will run indefinitely - waiting for new lines to be added to $filePath - so you'll need to use Ctrl-C to stop it.
While you can use a Get-TextBetweenTwoStrings call itself in a pipeline for object-by-object processing, assigning its result to a variable ($result = ...) won't work when terminating with Ctrl-C, because this method of termination also aborts the assignment operation.
To work around this limitation, the function below is defined as an advanced function, which automatically enables support for the common -OutVariable parameter, which is populated even in the event of termination with Ctrl-C; your sample call would then look as follows (as Theo notes, don't use the automatic $input variable as a custom variable):
# Look for blocks of interest in the input file, indefinitely,
# and output them as they're being found.
# After termination with Ctrl-C, $result will also contain the blocks
# found, if any.
Get-TextBetweenTwoStrings -OutVariable result -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
Per your feedback you want the block of lines to encompass the full lines on which the start and end patterns match, so the regexes below are enclosed in .*
The word pattern in your $startPattern and $endPattern parameters is a bit ambiguous in that it suggests that they themselves are regexes that can therefore be used as-is or embedded as-is in a larger regex on the RHS of the -match operator.
However, in the solution below I am assuming that they are be treated as literal strings, which is why they are escaped with [regex]::Escape(); simply omit these calls if these parameters are indeed regexes themselves; i.e.:
$startRegex = '.*' + $startPattern + '.*'
$endRegex = '.*' + $endPattern + '.*'
The solution assumes there is no overlap between blocks and that, in a given block, the start and end patterns are on separate lines.
Each block found is output as a single, multi-line string, using LF ("`n") as the newline character; if you want a CRLF newline sequences instead, use "`r`n"; for the platform-native newline format (CRLF on Windows, LF on Unix-like platforms), use [Environment]::NewLine.
# Note the use of "-" after "Get", to adhere to PowerShell's
# "<Verb>-<Noun>" naming convention.
function Get-TextBetweenTwoStrings {
# Make the function an advanced one, so that it supports the
# -OutVariable common parameter.
[CmdletBinding()]
param(
$startPattern,
$endPattern,
$filePath
)
# Note: If $startPattern and $endPattern are themselves
# regexes, omit the [regex]::Escape() calls.
$startRegex = '.*' + [regex]::Escape($startPattern) + '.*'
$endRegex = '.*' + [regex]::Escape($endPattern) + '.*'
$inBlock = $false
$block = [System.Collections.Generic.List[string]]::new()
Get-Content -Tail 100 -Wait $filePath | ForEach-Object {
if ($inBlock) {
if ($_ -match $endRegex) {
$block.Add($Matches[0])
# Output the block of lines as a single, multi-line string
$block -join "`n"
$inBlock = $false; $block.Clear()
}
else {
$block.Add($_)
}
}
elseif ($_ -match $startRegex) {
$inBlock = $true
$block.Add($Matches[0])
}
}
}
First of all, you should not use $input as self-defined variable name, because this is an Automatic variable.
Then, you are reading the file as a string array, where you would rather read is as a single, multiline string. For that append switch -Raw to the Get-Content call.
The regex you are creating does not allow fgor regex special characters in the start- and end patterns you give, so it I would suggest using [regex]::Escape() on these patterns when creating the regex string.
While your regex does use a group capturing sequence inside the brackets, you are not using that when it comes to getting the value you seek.
Finally, I would recommend using PowerShell naming convention (Verb-Noun) for the function name
Try
function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
# Get content from the input file
$fileContent = Get-Content -Path $filePath -Raw
# Regular expression (Regex) of the given start and end patterns
$pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
# Perform the Regex operation and output
[regex]::Match($fileContent,$pattern).Groups[1].Value
}
$inputFile = "D:\Test\THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
Would result in something like:
blahblah
more lines here
The (?is) makes the regex case-insensitive and have the dot match linebreaks as well
Nice to see you're using my version of the Get-TextBetweenTwoStrings function, however I believe you are mistaking the output in the console to output as in a dedicated text editor. In the console, too long lines will be truncated, whereas in a text editor like notepad, you can choose to wrap long lines or have a horizontal scrollbar.
If you simply append
| Set-Content -Path 'X:\wherever\theoutput.txt'
to the Get-TextBetweenTwoStrings .. call, you will find the lines are NOT truncated when you open it in Word or notepad for instance.
In fact, you can have that line folowed by
notepad 'X:\wherever\theoutput.txt'
to have notepad open that file straight away.
Related
I have the following powershell code I wrote that I am trying to optimize even further. I essentially need to get this code block down to under 259 characters. It's currently at 319, this is a challenge I know.
Using a mix of regex/wildcard matching/code golfing I think it is possible. But this is something I'm still learning.
This function will convert each character in the z file into its capslock and numlock value, then use send keys to in a very simplified way of explaining use the lights as a form of Morse code but in binary format instead.
I need this to run from the run box hence the character limit.
Why am I doing this? I'm passing data through the channel that controls the lock keys on the keyboard.
powershell "foreach($b in $(cat $env:tmp\z -En by)){foreach($a in 0x80,
0x40,0x20,0x10,0x08,0x04,0x02,0x01){if($b-band$a){$o+='%{NUMLOCK}'}else
{$o+='%{CAPSLOCK}'}}};$o+='%{SCROLLLOCK}';echo $o >$env:tmp\z;$f=(cat $env:tmp\z);Add-Type -A System.Windows.Forms;[System.Windows.Forms.SendKeys]::SendWait($f);rm $env:tmp\z"
I'm at work so I haven't gotten to test it yet but I think I got it down to 268 characters. And again it's needs to be at 259 or less.
powershell "$d='$env:tmp\z'%($b in $(cat -En by)){%($a in 0x80,
0x40,0x20,0x10,0x08,0x04,0x02,0x01){if($b-band$a){$o+='%{NUMLOCK}'}else
{$o+='%{CAPSLOCK}'}}};$o+='%{SCROLLLOCK}';echo $o >$d;$o=(cat $d);Add-Type -A *m.W*s.F*s;[*m.W*s.F*s.SendKeys]::SendWait($o);rm $d"
You'll know your solution works if you have a file in the tmp folder called "z" with no extention, and after running your code the lights for your lock keys should look like a rave.
Just to post the entirety of the code in readable format, here's the result with Mclayton's suggestion included:
# Gather the content from the file
# The use of (...) around the variable assignment lets the value pass through evaluating it as an expression.
Get-Content ($d=".\desktop\Abe.txt") -Encoding byte |
ForEach-Object -Process {
# Assign the current object in the pipeline to $b.
# This will allow the use of another Foreach-Object (%) for shorter code.
$b = $_;
128,64,32,16,8,4,2,1 |
Foreach-Object -Process {
# Append the results to $o as a concatenated string.
# Given a hashtable with the wanted values, you can access the value by providing the name in []'s.
# Since the bitwise operator -BAND only operates "properly" on two equal-length binary representations and if statement is needed.
# The if statement will return values 1/0 in accordance with -BAND
$o += "%{$(#{1="NUM";0="CAPS"}[$( if ($_-band$b) { 1 } else { 0 } )])LOCK}%{SCROLLLOCK}"
}
};
# Concatenate again to $o, while assigning to $f then outputting to $d.
($f = $o + "%{SCROLLLOCK}") | Out-File -FilePath $d;
Add-Type -AssemblyName 'System.Windows.Forms';
[System.Windows.Forms.SendKeys]::SendWait($f);
Remove-Item -Path $d
...and in shorter form:
gc ($d="$env:TEMP\z") -en by|%{$b=$_;128,64,32,16,8,4,2,1|%{$o+="%{$(#{1="NUM";0="CAPS"}[$(if($_-band$b){1}else{0})])LOCK}%{SCROLLLOCK}"}};($f=$o+"%{SCROLLLOCK}")>$d;Add-Type -A *m.W*s.F*s;[System.Windows.Forms.SendKeys]::SendWait($f);rm $d
I added comments in the code just for future readers trying to follow along.
I have a file prototype as follows:
// <some stuff>
#define KEYWORD release01-11
// <more stuff>
How can I delete the last two characters in the same line as KEYWORD and replace them with two different characters (12 in this case), in order to end up with:
// <some stuff>
#define KEYWORD release01-12
// <more stuff>
I'm trying to use Clear-Content and Add-Content but I cannot get it to do what I need. The rest of the file needs to remain unchanged after these symbols have been replaced. Is there a better alternative?
Use the -replace regex operator to identify the relevant statements and replace/remove the trailing numbers:
# read file into a variable
$code = Get-Content myfile.c
# replace the trailing -XX with 12 in all lines starting with `#define KEYWORD`, with
$code = $code -replace '(?<=#define KEYWORD .+-)\d{2}\s*$','12'
# write the contents back to the file
$code |Set-Content myfile.c
The regex construct (?<=...) is a positive lookbehind - it ensures that the following expression will only match at a position where text right behind it is #define KEYWORD followed by some characters and a -.
If you want to always increment the current value (as opposed to just replacing it with 12), we'll need some way to inspect and evaluate the current value before doing the substitution.
The [Regex]::Replace() method allows for just that:
# read file into a variable
$code = Get-Content myfile.c
$code = $code |ForEach-Object {
# Same as before, but now we can hook into the regex engine's substitution routine
[regex]::Replace($_, '(?<=#define KEYWORD .+-)\d{2}\s*$',{
param($m)
# extract the trailing numbers, convert to a numerical type
$value = $m.Value -as [int]
# increment the value
$value++
# return the new value
return $value
})
}
# write the contents back to the file
$code |Set-Content myfile.c
In PowerShell 6.1 and up, the -replace operator natively supports scriptblock substitutions:
$code = $code |ForEach-Object {
# Same as before, but now we can hook into the regex engine's substitution routine
$_ -replace '(?<=#define KEYWORD .+-)\d{2}\s*$',{
# extract the trailing numbers, convert to a numerical type
$value = $_.Value -as [int]
# increment the value
$value++
# return the new value
return $value
}
}
I'm trying to use a list of phrases (over 100) which I want to be removed from a text file (products.txt) which has lines of text inside it (they are tab separated / new line each). So that the results which do not match the list of phrases will be re-written in the current file.
#cd .\Desktop\
$productlist = #(
'example',
'juicebox',
'telephone',
'keyboard',
'manymore')
foreach ($product in $productlist) {
get-childitem products.txt | Select-String -Pattern $product -NotMatch | foreach {$_.line} | Out-File -FilePath .\products.txt
}
The above code does not remove the words listed in the $productlist, it simply outputs all links in products.txt again.
The lines inside of products.txt file are these:
productcatalog
product1example
juicebox038
telephoneiphone
telephoneandroid
randomitem
logitech
coffeetable
razer
Thank you for your help.
Here's my solution. You need the parentheses otherwise the input file will be in use when trying to write to the file. Select-string accepts an array of patterns. I wish I could pipe 'path' to set-content but it doesn't work.
$productlist = 'example', 'juicebox', 'telephone', 'keyboard', 'manymore'
(Select-String $productlist products.txt -NotMatch) | % line |
set-content products.txt
here's one way to do what you want. it's somewhat more direct than what yo used. [grin] it uses the way that PoSh can act on an entire collection when it is on the LEFT side of an operator.
what it does ...
fakes reading in a text file
when ready to do this in real life, replace the whole #region/#endregion block with a call to Get-Content.
builds the exclude list
converts that into a regex OR pattern
filters out the items that match the unwanted list
shows that resulting list
the code ...
#region >>> fake reading in a text file
# when ready to do this for real, replace the whole "#region/#endregion" block with a call to Get-Content
$ProductList = #'
productcatalog
product1example
juicebox038
telephoneiphone
telephoneandroid
randomitem
logitech
coffeetable
razer
'# -split [System.Environment]::NewLine
#endregion >>> fake reading in a text file
$ExcludedProductList = #(
'example'
'juicebox'
'telephone'
'keyboard'
'manymore'
)
$EPL_Regex = $ExcludedProductList -join '|'
$RemainingProductList = $ProductList -notmatch $EPL_Regex
$RemainingProductList
output ...
productcatalog
randomitem
logitech
coffeetable
razer
I have a requirement, in which I need to do read line by line, and then do string/character replacement in a datafile having data in windows latin 1.
I've written this powershell (my first one) initially using out-file -encoding option. However the output file thus created was doing some character translation. Then I searched and came across WriteAllLines, but I'm unable to use it in my code.
$encoding =[Text.Encoding]::GetEncoding('iso-8859-1')
$pdsname="ABCD.XYZ.PQRST"
$datafile="ABCD.SCHEMA.TABLE.DAT"
Get-Content ABCD.SCHEMA.TABLE.DAT | ForEach-Object {
$matches = [regex]::Match($_,'ABCD')
$string_to_be_replaced=$_.substring($matches.Index,$pdsname.Length+10)
$string_to_be_replaced="`"$string_to_be_replaced`""
$member = [regex]::match($_,"`"$pdsname\(([^\)]+)\)`"").Groups[1].Value
$_ -replace $([regex]::Escape($string_to_be_replaced)),$member
} | [System.IO.File]::WriteAllLines("C:\Users\USer01", "ABCD.SCHEMA.TABLE.NEW.DAT", $encoding)
With the help of an answer from #Gzeh Niert, I updated my above script. However, when I execute the script the output file being generated by the script has just the last record, as it was unable to append, and it did an overwrite, I tried using System.IO.File]::AppendAllText, but this strangely creates a larger file, and has only the last record. In short its likely that empty lines are being written.
param(
[String]$datafile
)
$pdsname="ABCD.XYZ.PQRST"
$encoding =[Text.Encoding]::GetEncoding('iso-8859-1')
$datafile = "ABCD.SCHEMA.TABLE.DAT"
$datafile2="ABCD.SCHEMA.TABLE.NEW.DAT"
Get-Content $datafile | ForEach-Object {
$matches = [regex]::Match($_,'ABCD')
if($matches.Success) {
$string_to_be_replaced=$_.substring($matches.Index,$pdsname.Length+10)
$string_to_be_replaced="`"$string_to_be_replaced`""
$member = [regex]::match($_,"`"$pdsname\(([^\)]+)\)`"").Groups[1].Value
$replacedContent = $_ -replace $([regex]::Escape($string_to_be_replaced)),$member
[System.IO.File]::AppendAllText($datafile2, $replacedContent, $encoding)
}
else {
[System.IO.File]::AppendAllText($datafile2, $_, $encoding)
}
#[System.IO.File]::WriteAllLines($datafile2, $replacedContent, $encoding)
}
Please help me figure out where I am going wrong.
System.IO.File.WriteAllLines is getting either an array of strings or an IEnumerable of strings as second parameter and cannot be piped to a command because it is not a CmdLet handling pipeline input but a .NET Framework method.
You should try storing your replaced content into a string[]to use it as parameter when saving the file.
param(
[String]$file
)
$encoding =[Text.Encoding]::GetEncoding('iso-8859-1')
$replacedContent = [string[]]#(Get-Content $file | ForEach-Object {
# Do stuff
})
[System.IO.File]::WriteAllLines($file, $replacedContent, $encoding)
values.ini looks like
[default]
A=1
B=2
C=3
foo.txt looks like
Now is the %A% for %a% %B% men to come to the %C% of their %c%
I want to use Powershell to search for all of the %x% values in values.ini and then replace every matching instance in foo.txt with the corresponding value, case insensitively; generating the following:
Now is the 1 for 1 2 men to come to the 3 of their 3
Assuming PowerShell version 3.0 or newer, you can use the ConvertFrom-StringData cmdlet to parse the key-value pair in your ini file, but you'll need to filter out the [default] directive:
# grab relevant lines from file
$KeyValPairs = Get-Content .\values.ini | Where {$_ -like "*=*" }
# join strings together as one big string
$KeyValPairString = $KeyValPairs -join [Environment]::NewLine
# create hashtable/dictionary from string with ConvertFrom-StringData
$Dictionary = $KeyValPairString |ConvertFrom-StringData
You can then use the [regex]::Replace() method to do a lookup against the dictionary for each match you want to replace:
Get-Content .\foo.txt |ForEach-Object {
[Regex]::Replace($_, '%(\p{L}+)%', {
param($Match)
# look term up in dictionary
return $Dictionary[$Match.Groups[1].Value]
})
}
To complement Mathias R. Jessen's excellent answer with alternative approaches that also take the later requirement change of limiting values to a specific INI-file section into account (PSv2+, except for Get-Content -Raw; in PSv2, use (Get-Content ...) -join "`n" instead.)
Using PsIni\Get-IniContent and [environment]::ExpandEnvironmentVariables():
# Translate key-value pairs from section the section of interest
# into environment variables.
# After this command, the following environment variables are defined:
# $env:A, with value 1 (cmd.exe equivalent: %A%)
# $env:B, with value 2 (cmd.exe equivalent: %B%)
# $env:C, with value 3 (cmd.exe equivalent: %C%)
$section = 'default' # Specify the INI-file section of interest.
(Get-IniContent values.ini)[$section].GetEnumerator() |
ForEach-Object { Set-Item "env:$($_.Name)" -Value $_.Value }
# Read the template string as a whole from file foo.txt, and expand the
# environment-variable references in it, using the .NET framework.
# With the sample input, this yields
# "Now is the 1 for 1 2 men to come to the 3 of their 3".
[environment]::ExpandEnvironmentVariables((Get-Content -Raw foo.txt))
The 3rd-party Get-IniContent cmdlet, which conveniently reads an INI file (*.ini) into a nested, ordered hashtable, can easily be installed with Install-Module PsIni from an elevated console (alternatively, add -Scope CurrentUser), if you have PS v5+ (or v3 or v4 with PackageManagement installed).
This solution takes advantage of the fact that the placeholders (e.g., %a%) look like cmd.exe-style environment-variable references.
Note the assumptions and caveats:
All ini-file keys / placeholder names are legal environment-variable names.
Preexisting variables may be overwritten, which can be problematic with names such as PATH.
Cross-platform caveat: on Unix-like platforms, environment-variable references are case-sensitive, so the solution won't work the same there.
Using custom INI-file parsing and [environment]::ExpandEnvironmentVariables():
If installing a module for INI-file parsing is not an option, the following solution uses a - rather complex - regular expression to extract the section of interest via the -replace operator.
$section = 'default' # Specify the INI-file section of interest.
# Get all non-empty, non-comment lines from the section using a regex.
$sectLines = (Get-Content -Raw values.ini) -replace ('(?smn)\A.*?(^|\r\n)\[' + [regex]::Escape($section) + '\]\r\n(?<sectLines>.*?)(\r\n\[.*|\Z)'), '${sectLines}' -split "`r`n" -notmatch '(^;|^\s*$)'
# Define the key-value pairs as environment variables.
$sectlines | ForEach-Object { $tokens = $_ -split '=', 2; Set-Item "env:$($tokens[0].Trim())" -Value $tokens[1].Trim() }
# Read the template string as a whole, and expand the environment-variable
# references in it, as before.
[environment]::ExpandEnvironmentVariables((Get-Content -Raw foo.txt))
I found a simpler solution using this INI script called Get-IniContent.
#read from Setup.ini
$INI = Get-IniContent .\Setup.ini
$sec="setup"
#REPLACE VARIABLES
foreach($c in Get-ChildItem -Path .\Application -Recurse -Filter *.config)
{
Write-Output $c.FullName
Write-Output $c.DirectoryName
$configFile = Get-Content $c.FullName -Raw
foreach($v in $INI[$sec].Keys)
{
$k = '%'+$v+'%'
$match = [regex]::IsMatch($configFile, $k)
if($match)
{
$configFile = $configFile -ireplace [regex]::Escape($k), $INI[$sec][$v]
}
}
Set-Content $c.FullName -Value $configFile
}