Powershell script to write to file maintaining structure - powershell

I am working with powershell to read in a file. See sample content in the file.
This is my file with content
-- #Start
This is more content
across different lines
etc etc
-- #End
I am using this code to read in file to a variable.
$content = Get-Content "Myfile.txt";
I then use this code to strip a particular section from the file and based on opening and closing tag.
$stringBuilder = New-Object System.Text.StringBuilder;
$pattern = "-- #Start(.*?)-- #End";
$matched = [regex]::match($content, $pattern).Groups[1].Value;
$stringBuilder.AppendLine($matched.Trim());
$stringBuilder.ToString() | Out-File "Newfile.txt" -Encoding utf8;
The problem that I have is in the file I write to, the formatting is not maintained. So what I want is:
This is more content
across different lines
etc etc
But what I am getting is:
This is more content across different lines etc etc
Any ideas how I can alter my code so that in the outputted file the structures is maintained (multiple lines)?

This regex might do what you're looking for, don't see a point on using a StringBuilder in this case. Do note, since this is a multi-line regex pattern you need to use the -Raw switch to read your file's content.
$re = [regex] '(?ms)(?<=^-- #Start\s*\r?\n).+?(?=^-- #End)'
$re.Match((Get-Content path\to\Myfile.txt -Raw)).Value |
Set-Content path\to\newFile.txt -NoNewLine
See https://regex101.com/r/82HJxf/1 for details.
If you want to do line-by-line processing, you could use a switch to read and process the lines of interest. This is particularly useful if the file is very big and doesn't fit in memory.
& {
$capture = $false
switch -Rege -File path\to\Myfile.txt {
'^-- #Start' { $capture = $true }
'^-- #End' { $capture = $false }
Default { if($capture) { $_ } }
}
} | Set-Content path\to\newFile.txt
If there is only one appearance of the opening and closing tag, you could even break the switch as soon as it encounters the closing tag to stop processing:
'^-- #End' { break }

Related

PowerShell Extract text between two strings with -Tail and -Wait

I have a text file with a large number of log messages.
I want to extract the messages between two string patterns. I want the extracted message to appear as it is in the text file.
I tried the following methods. It works, but doesn't support Get-Content's -Wait and -Tail options. Also, the extracted results are displayed in one line, but not like the text file. Inputs are welcome :-)
Sample Code
function GetTextBetweenTwoStrings($startPattern, $endPattern, $filePath){
# Get content from the input file
$fileContent = Get-Content $filePath
# Regular expression (Regex) of the given start and end patterns
$pattern = "$startPattern(.*?)$endPattern"
# Perform the Regex opperation
$result = [regex]::Match($fileContent,$pattern).Value
# Finally return the result to the caller
return $result
}
# Clear the screen
Clear-Host
$input = "THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
# Call the function
GetTextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $input
Improved script based on Theo's answer.
The following points need to be improved:
The beginning and end of the output is somehow trimmed despite I adjusted the buffer size in the script.
How to wrap each matched result into START and END string?
Still I could not figure out how to use the -Wait and -Tail options
Updated Script
# Clear the screen
Clear-Host
# Adjust the buffer size of the window
$bw = 10000
$bh = 300000
if ($host.name -eq 'ConsoleHost') # or -notmatch 'ISE'
{
[console]::bufferwidth = $bw
[console]::bufferheight = $bh
}
else
{
$pshost = get-host
$pswindow = $pshost.ui.rawui
$newsize = $pswindow.buffersize
$newsize.height = $bh
$newsize.width = $bw
$pswindow.buffersize = $newsize
}
function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
# Get content from the input file
$fileContent = Get-Content -Path $filePath -Raw
# Regular expression (Regex) of the given start and end patterns
$pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
# Perform the Regex operation and output
[regex]::Match($fileContent,$pattern).Groups[1].Value
}
# Input file path
$inputFile = "THE-LOG-FILE.log"
# The patterns
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
You need to perform streaming processing of your Get-Content call, in a pipeline, such as with ForEach-Object, if you want to process lines as they're being read.
This is a must if you're using Get-Content -Wait, because such a call doesn't terminate by itself (it keeps waiting for new lines to be added to the file, indefinitely), but inside a pipeline its output can be processed as it is being received, even before the command terminates.
You're trying to match across multiple lines, which with Get-Content output would only work if you used the -Raw switch - by default, Get-Content reads its input file(s) line by line.
However, -Raw is incompatible with -Wait.
Therefore, you must stick with line-by-line processing, which requires that you match the start and end patterns separately, and keep track of when you're processing lines between those two patterns.
Here's a proof of concept, but note the following:
-Tail 100 is hard-coded - adjust as needed or make it another parameter.
The use of -Wait means that the function will run indefinitely - waiting for new lines to be added to $filePath - so you'll need to use Ctrl-C to stop it.
While you can use a Get-TextBetweenTwoStrings call itself in a pipeline for object-by-object processing, assigning its result to a variable ($result = ...) won't work when terminating with Ctrl-C, because this method of termination also aborts the assignment operation.
To work around this limitation, the function below is defined as an advanced function, which automatically enables support for the common -OutVariable parameter, which is populated even in the event of termination with Ctrl-C; your sample call would then look as follows (as Theo notes, don't use the automatic $input variable as a custom variable):
# Look for blocks of interest in the input file, indefinitely,
# and output them as they're being found.
# After termination with Ctrl-C, $result will also contain the blocks
# found, if any.
Get-TextBetweenTwoStrings -OutVariable result -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
Per your feedback you want the block of lines to encompass the full lines on which the start and end patterns match, so the regexes below are enclosed in .*
The word pattern in your $startPattern and $endPattern parameters is a bit ambiguous in that it suggests that they themselves are regexes that can therefore be used as-is or embedded as-is in a larger regex on the RHS of the -match operator.
However, in the solution below I am assuming that they are be treated as literal strings, which is why they are escaped with [regex]::Escape(); simply omit these calls if these parameters are indeed regexes themselves; i.e.:
$startRegex = '.*' + $startPattern + '.*'
$endRegex = '.*' + $endPattern + '.*'
The solution assumes there is no overlap between blocks and that, in a given block, the start and end patterns are on separate lines.
Each block found is output as a single, multi-line string, using LF ("`n") as the newline character; if you want a CRLF newline sequences instead, use "`r`n"; for the platform-native newline format (CRLF on Windows, LF on Unix-like platforms), use [Environment]::NewLine.
# Note the use of "-" after "Get", to adhere to PowerShell's
# "<Verb>-<Noun>" naming convention.
function Get-TextBetweenTwoStrings {
# Make the function an advanced one, so that it supports the
# -OutVariable common parameter.
[CmdletBinding()]
param(
$startPattern,
$endPattern,
$filePath
)
# Note: If $startPattern and $endPattern are themselves
# regexes, omit the [regex]::Escape() calls.
$startRegex = '.*' + [regex]::Escape($startPattern) + '.*'
$endRegex = '.*' + [regex]::Escape($endPattern) + '.*'
$inBlock = $false
$block = [System.Collections.Generic.List[string]]::new()
Get-Content -Tail 100 -Wait $filePath | ForEach-Object {
if ($inBlock) {
if ($_ -match $endRegex) {
$block.Add($Matches[0])
# Output the block of lines as a single, multi-line string
$block -join "`n"
$inBlock = $false; $block.Clear()
}
else {
$block.Add($_)
}
}
elseif ($_ -match $startRegex) {
$inBlock = $true
$block.Add($Matches[0])
}
}
}
First of all, you should not use $input as self-defined variable name, because this is an Automatic variable.
Then, you are reading the file as a string array, where you would rather read is as a single, multiline string. For that append switch -Raw to the Get-Content call.
The regex you are creating does not allow fgor regex special characters in the start- and end patterns you give, so it I would suggest using [regex]::Escape() on these patterns when creating the regex string.
While your regex does use a group capturing sequence inside the brackets, you are not using that when it comes to getting the value you seek.
Finally, I would recommend using PowerShell naming convention (Verb-Noun) for the function name
Try
function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
# Get content from the input file
$fileContent = Get-Content -Path $filePath -Raw
# Regular expression (Regex) of the given start and end patterns
$pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
# Perform the Regex operation and output
[regex]::Match($fileContent,$pattern).Groups[1].Value
}
$inputFile = "D:\Test\THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
Would result in something like:
blahblah
more lines here
The (?is) makes the regex case-insensitive and have the dot match linebreaks as well
Nice to see you're using my version of the Get-TextBetweenTwoStrings function, however I believe you are mistaking the output in the console to output as in a dedicated text editor. In the console, too long lines will be truncated, whereas in a text editor like notepad, you can choose to wrap long lines or have a horizontal scrollbar.
If you simply append
| Set-Content -Path 'X:\wherever\theoutput.txt'
to the Get-TextBetweenTwoStrings .. call, you will find the lines are NOT truncated when you open it in Word or notepad for instance.
In fact, you can have that line folowed by
notepad 'X:\wherever\theoutput.txt'
to have notepad open that file straight away.

Powershell- match split and replace based on index

I have a file
AB*00*Name1First*Name1Last*test
BC*JCB*P1*Church St*Texas
CD*02*83*XY*Fax*LM*KY
EF*12*Code1*TX*1234*RJ
I need to replace the 5th element in the CD segment alone from LM to ET in each of the file in the folder. Element delimiter is * as mentioned in the above sample file content. I am new to PowerShell and tried a code as below but unfortunately it is not giving desired results. Can any of you please provide some help?
foreach($xfile in $inputfolder)
{
If ($_ match "^CD\*")
{
[System.IO.File]::ReadAllText($xfile).replace(($_.split("*")[5],"ET") | Set-Content $xfile
}
[System.IO.File]::WriteAllText($xfile),((Get-Content $xfile -join("~")))
}
here's a slightly different way to get there ... [grin] what it does ...
fakes reading in a test file
when ready to do this for real, remove the entire #region/#endregion block and use Get-Content.
sets the constants
iterates thru the imported text file lines
checks for a line that starts with the target pattern
if found ...
== escapes the old value with [regex]::Escape() to deal with the asterisks
== replaces the escaped old value with the new value
== outputs the new version of that line
if NOT found, outputs the line as-is
stores all the lines into the $OutStuff var
displays that on screen
the code ...
#region >>> fake reading in a plain text file
# in real life, use Get-Content
$InStuff = #'
AB*00*Name1First*Name1Last*test
BC*JCB*P1*Church St*Texas
CD*02*83*XY*Fax*LM*KY
EF*12*Code1*TX*1234*RJ
'# -split [System.Environment]::NewLine
#endregion >>> fake reading in a plain text file
$TargetLineStart = 'CD*'
$OldValue = '*LM*'
$NewValue = '*ET*'
$OutStuff = foreach ($IS_Item in $InStuff)
{
if ($IS_Item.StartsWith($TargetLineStart))
{
$IS_Item -replace [regex]::Escape($OldValue), $NewValue
}
else
{
$IS_Item
}
}
$OutStuff
output ...
AB*00*Name1First*Name1Last*test
BC*JCB*P1*Church St*Texas
CD*02*83*XY*Fax*ET*KY
EF*12*Code1*TX*1234*RJ
i will leave saving that to a new file [or overwriting the old one] to the user. [grin]
You could capture all that comes before the match in group 1, and match LM.
In the replacement use $1ET
^(CD*(?:[^*\r\n]+\*){5})LM\b
Regex demo
If you don't want to match LM literally, you could also match any other char than * or a newline.
^(CD*(?:[^*\r\n]+\*){5})[^*\r\n]+\b
Replace example
$allText = Get-Content -Raw file.txt
$allText -replace '(?m)^(CD*(?:[^*\r\n]+\*){5})LM\b','$1ET'
Output
AB*00*Name1First*Name1Last*test
BC*JCB*P1*Church St*Texas
CD*02*83*XY*Fax*ET*KY
EF*12*Code1*TX*1234*RJ

How to get output in desired encoding scheme using powershell out-fil

I have a requirement, in which I need to do read line by line, and then do string/character replacement in a datafile having data in windows latin 1.
I've written this powershell (my first one) initially using out-file -encoding option. However the output file thus created was doing some character translation. Then I searched and came across WriteAllLines, but I'm unable to use it in my code.
$encoding =[Text.Encoding]::GetEncoding('iso-8859-1')
$pdsname="ABCD.XYZ.PQRST"
$datafile="ABCD.SCHEMA.TABLE.DAT"
Get-Content ABCD.SCHEMA.TABLE.DAT | ForEach-Object {
$matches = [regex]::Match($_,'ABCD')
$string_to_be_replaced=$_.substring($matches.Index,$pdsname.Length+10)
$string_to_be_replaced="`"$string_to_be_replaced`""
$member = [regex]::match($_,"`"$pdsname\(([^\)]+)\)`"").Groups[1].Value
$_ -replace $([regex]::Escape($string_to_be_replaced)),$member
} | [System.IO.File]::WriteAllLines("C:\Users\USer01", "ABCD.SCHEMA.TABLE.NEW.DAT", $encoding)
With the help of an answer from #Gzeh Niert, I updated my above script. However, when I execute the script the output file being generated by the script has just the last record, as it was unable to append, and it did an overwrite, I tried using System.IO.File]::AppendAllText, but this strangely creates a larger file, and has only the last record. In short its likely that empty lines are being written.
param(
[String]$datafile
)
$pdsname="ABCD.XYZ.PQRST"
$encoding =[Text.Encoding]::GetEncoding('iso-8859-1')
$datafile = "ABCD.SCHEMA.TABLE.DAT"
$datafile2="ABCD.SCHEMA.TABLE.NEW.DAT"
Get-Content $datafile | ForEach-Object {
$matches = [regex]::Match($_,'ABCD')
if($matches.Success) {
$string_to_be_replaced=$_.substring($matches.Index,$pdsname.Length+10)
$string_to_be_replaced="`"$string_to_be_replaced`""
$member = [regex]::match($_,"`"$pdsname\(([^\)]+)\)`"").Groups[1].Value
$replacedContent = $_ -replace $([regex]::Escape($string_to_be_replaced)),$member
[System.IO.File]::AppendAllText($datafile2, $replacedContent, $encoding)
}
else {
[System.IO.File]::AppendAllText($datafile2, $_, $encoding)
}
#[System.IO.File]::WriteAllLines($datafile2, $replacedContent, $encoding)
}
Please help me figure out where I am going wrong.
System.IO.File.WriteAllLines is getting either an array of strings or an IEnumerable of strings as second parameter and cannot be piped to a command because it is not a CmdLet handling pipeline input but a .NET Framework method.
You should try storing your replaced content into a string[]to use it as parameter when saving the file.
param(
[String]$file
)
$encoding =[Text.Encoding]::GetEncoding('iso-8859-1')
$replacedContent = [string[]]#(Get-Content $file | ForEach-Object {
# Do stuff
})
[System.IO.File]::WriteAllLines($file, $replacedContent, $encoding)

Capture ffmpeg's metadata output in powershell

I'm trying to capture the output of ffmpeg in PowerShell(tm) to get some metadata on some ogg & mp3 files. But when I do:
ffmpeg -i file.ogg 2>&1 | sls GENRE
The output includes a bunch of lines without my matching string, "GENRE":
album_artist : Post Human Era
ARTIST : Post Human Era
COMMENT : Visit http://posthumanera.bandcamp.com
DATE : 2013
GENRE : Music
TITLE : Supplies
track : 1
At least one output file must be specified
I am guessing something is different in the encoding. ffmpeg's output is colored, so maybe there are color control characters in the output that are breaking things? Or, maybe ffmpeg's output isn't playing nicely with powershell's default UTF-16? I can't figure out if there is another way to redirect stderr and remove the color characters or change the encoding of stderr.
EDIT:
Strangely, I also get indeterminate output. Sometimes the output is as shown above. Sometimes with precisely the same command the output is:
GENRE :
Which makes slightly more sense, but is still missing the part of the line I care about ('Music').
Somewhere powershell is interpreting something as newlines that is not newlines.
I am still seeing this behavior when I use the old powershell, but I have since upgraded to PowerShell Core (7.0.2), and the problem seems to be solved. I read somewhere that with PowerShell Core they've changed the default encoding to UTF-8, so perhaps it is something related to that.
My theory is that in the old version, whatever code combines the outputstreams normally would make sure that individual lines were preserved and interleaved instead of of cut up. But I would guess that this code is looking for newlines in the default encoding, not UTF-8, so when it receives two UTF-8 streams it doesn't parse the line delimiters correctly and you get weird splits. It seems like there should be a way to change the encoding before it gets to mixing the output streams, but I'm not sure (and now it doesn't matter since it works). Why the output seems to change nondeterministically, I don't know, unless there is something nondeterministic about parsing UTF8 bytes as if they were UTF16 or whatever the default is.
I got something working for my script catching all the output with regex and pipe to a custom object
Function Rotate-Video {
param(
[STRING]$FFMPEGEXE = "P:\Video Editing\ffmpeg-4.3.1-2020-10-01-full_build\bin\ffmpeg.exe",
[parameter(ValueFromPipeline = $true)]
[STRING]$Source = "D:\Video\Source",
[STRING]$Destination = 'D:\Video\Destination',
[STRING]$DestinationExtention='mp4'
)
(Get-ChildItem $Source) | ForEach-Object {
$FileExist = $false
$Source = $_.fullname
$Name = $_.basename
$outputName = $name+'.'+$DestinationExtention
$Fullpath = Join-Path -Path $Destination -ChildPath $outputName
$Regex = "(\w+)=\s+(\d+)\s+(\w+)=(\d+.\d+)\s+(\w)=(\d+.\d+)\s+(\w+)=\s+(\d+)\w+\s+(\w+)=(\d+:\d+:\d+.\d+)\s+(\w+)=(\d+.\d+)\w+\/s\s+(\w+)=(\d+.\d+)"
&$FFMPEGEXE -i $Source -vf transpose=clock $Fullpath 2>&1 | Select-String -Pattern $Regex | ForEach-Object {
$output = ($_ | Select-String -Pattern $regex).Matches.Groups
[PSCUSTOMOBJECT]#{
Source = $source
Destination = $Fullpath
$output[1] = $output[2]
$output[3] = $output[4]
$output[5] = $output[6]
$output[7] = $output[8]
$output[9] = $output[10]
$output[11] = $output[12]
$output[13] = $output[14]
}
}
}
}

In powershell, i want Ioop twice through a text file but in second loop i want to continue from end of first loop

I have a text file to process. Text file has some configuration data and some networking commands. I want to run all those network commands and redirect output in some log file.
At starting of text file,there are some configuration information like File-name and file location. This can be used for naming log file and location of log file. These line starts with some special characters like '<#:'. just to know that rest of the line is config data about file not the command to execute.
Now, before i want start executing networking commands (starts with some special characters like '<:'), first i want to read all configuration information about file i.e. file name, location, overwrite flag etc. Then i can run all commands and dump output into log file.
I used get-content iterator to loop over entire text file.
Question: Is there any way to start looping over file from a specific line again?
So that i can process config information first (loop till i first encounter command to execute, remember this line number), create log file and then keep running commands and redirect output to log file (loop from last remembered line number).
Config File looks like:
<#Result_File_Name:dump1.txt
<#Result_File_Location:C:\powershell
<:ping www.google.com
<:ipconfig
<:traceroute www.google.com
<:netsh interface ip show config
My powerhsell script looks like:
$content = Get-Content C:\powershell\config.txt
foreach ($line in $content)
{
if($line.StartsWith("<#Result_File_Name:")) #every time i am doing this, even for command line
{
$result_file_arr = $line.split(":")
$result_file_name = $result_file_arr[1]
Write-Host $result_file_name
}
#if($line.StartsWith("<#Result_File_Location:"))#every time i am doing this, even for command line
#{
# $result_file_arr = $line.split(":")
# $result_file_name = $result_file_arr[1]
#}
if( $conf_read_over =1)
{
break;
}
if ($line.StartsWith("<:")) #In this if block, i need to run all commands
{
$items = $line.split("<:")
#$items[0]
#invoke-expression $items[2] > $result_file_name
invoke-expression $items[2] > $result_file_name
}
}
If all the config information starts with <# just process those out first separately. Once that is done you can assume the rest are commands?
# Collect config lines and process
$config = $content | Where-Object{$_.StartsWith('<#')} | ForEach-Object{
$_.Trim("<#") -replace "\\","\\" -replace "^(.*?):(.*)" , '$1 = $2'
} | ConvertFrom-StringData
# Process all the lines that are command lines.
$content | Where-Object{!$_.StartsWith('<#') -and ![string]::IsNullOrEmpty($_)} | ForEach-Object{
Invoke-Expression $_.trimstart("<:")
}
I went a little over board with the config section. What I did was convert it into a hashtable. Now you will have your config options, as they were in file, accessible as an object.
$config
Name Value
---- -----
Result_File_Name dump1.txt
Result_File_Location C:\powershell
Small reconfiguration of your code, with some parts missing, would look like the following. You will most likely need to tweak this to your own needs.
# Collect config lines and process
$config = ($content | Where-Object{$_.StartsWith('<#')} | ForEach-Object{
$_.Trim("<#") -replace "\\","\\" -replace "^(.*?):(.*)" , '$1 = $2'
} | Out-String) | ConvertFrom-StringData
# Process all the lines that are command lines.
$content | Where-Object{!$_.StartsWith('<#') -and ![string]::IsNullOrEmpty($_)} | ForEach-Object{
Invoke-Expression $_.trimstart("<:") | Add-Content -Path $config.Result_File_Name
}
As per your comment you are still curious about your restart loop logic which was part of your original question. I will add this as a separate answer to that. I would still prefer my other approach.
# Use a flag to determine if we have already restarted. Assume False
$restarted = $false
$restartIndexPoint = 4
$restartIndex = 2
for($contentIndex = 0; $contentIndex -lt $content.Length; $contentIndex++){
Write-Host ("Line#{0} : {1}" -f $contentIndex, $content[$contentIndex])
# Check to see if we are on the $restartIndexPoint for the first time
if(!$restarted -and $contentIndex -eq $restartIndexPoint){
# Set the flag so this does not get repeated.
$restarted = $true
# Reset the index to repeat some steps over again.
$contentIndex = $restartIndex
}
}
Remember that array indexing is 0 based when you are setting your numbers. Line 20 is element 19 in the string array for example.
Inside the loop we run a check. If it passes we change the current index to something earlier. The write-host will just print the lines so you can see the "restart" portion. We need a flag to be set so that we are not running a infinite loop.