Get-Content & combine "continued lines" - powershell

I have a txt file pulled into an array using Get-Content which uses _ as the line continue token, and the number of continued lines can be anything from one to many. So the text might look like this...
Jrn.Directive "DocSymbol" _
, "[Commercial-Default.rte]"
Jrn.Directive "GlobalToProj" _
, "[Commercial-Default.rte]", "Floor Plan: Level 1" _
, 0.01041666666667 _
, 1.00000000000000, 0.00000000000000, 0.00000000000000 _
, 0.00000000000000, 1.00000000000000, 0.00000000000000 _
, 0.00000000000000, 0.00000000000000, 1.00000000000000 _
, 0.00000000000000, 0.00000000000000, 0.00000000000000
I would like to reformat without line wrapping, and I am wondering if there is some super elegant approach to this I am not seeing? Because what i see as the way forward is a foreach $line in $array, and if the line EndsWith("") set a start index to the lines index, then search forward till a line doesn't EndsWith("") and set an end index, combine the bits and write to a temporary array, then skip the difference between the two indexes as the main loop continues to read lines. If that makes sense without some more detailed pseudo code.
In any case, it seems clumsy and inelegant, and I wonder if there is a better approach?
My initial thought was that Get-Content might have something built in, but it looks like the only delimiter you can define is End of Line (defaults to \n).
So, based on Anthony's input, and realizing that I needed to combine lines first, THEN remove irrelevant lines (that may have been multiple lines to start with) I now have this.
$target = 'Z:\Support\Px 3.0\RFO Benchmark\Journal Cleanup\journal.0010.txt'
$cleanFile = 'Z:\Support\Px 3.0\RFO Benchmark\Journal Cleanup\journal.0010.CLEAN.txt'
$sourceFile = Get-Content $target
$cleanData = #()
function Relavant {
[CmdletBinding()]
param (
[string]$line
)
$irrelevant = #('Jrn.Directive “Username"', 'Jrn.Directive "IdleTimeTaskSymbol"', 'Jrn.Directive "WindowSize"', 'Jrn.Size')
foreach ($item in $irrelevant) {
if ($line.StartsWith($item)) {
$relevant = $false
} else {
$relevant = $true
}
}
$relevant
}
$string = ''
$continue = $false
$tempData = $(foreach ($line in $tempData) {
if ($line -match '^[^,]') {
$string = ''
$continue = $true
}
if ($continue) {
$string += $line
}
if ($line.EndsWith('_')) {
$continue = $true
} else {
$continue = $false
$string -replace '\s?_'
}
})
# Remove comments & irrelevant lines and do basic formatting
foreach ($line in $tempData) {
$line = $line.Trim()
if (-not ($line.StartsWith("'"))) {
if (Relavant $line) {
$line = $line -replace " ,", ","
$line = $line -replace '\s+', ' '
$cleanData += $line
}
}
}
Add-Content $cleanFile "' Cleaned by PxJournalCleaner`n"
foreach ($line in $cleanData) {
Add-Content $cleanFile $line
}
It's working well, but I suspect I will implement it again with the alternative approach just for the education factor if nothing else. I'm also not sure I fully understand what's going on in Anthony's approach, so I obviously still have some poking around to do. Thanks all!

you should probably make the regex matches a little more precise, but it worked for me
$file = gc 'C:\temp\new 1.txt'
$string = ''
$cont = $false
$result = $(foreach ($line in $file) {
if ($line -match '^[^,]') {
$string = ''
$cont = $true
}
if ($cont) {
$string += $line
}
if ($line.EndsWith('_')) {
$cont = $true
} else {
$cont = $false
$string -replace '\s?_'
}
})
$result

Your approach seems totally fine, although I would probably just do it one line at a time.
You could do something like:
# read the wrapped lines from file
$lines = Get-Content C:\yourfile.txt
# initialize an array with a single empty string + a cursor that we'll use to keep track of the last index
$unwrappedLines = ,""
$cursor = 0
# iterate over the input strings
foreach($line in $lines){
if($line.EndsWith(" _")){
# Line is to be continued, remove line continuation character and add the rest of the string to the current index in our new array
$unwrappedLines[$cursor] += $line.Substring(0,$line.Length - 2)
}
else
{
# Line is not to be continued, add value as-is to current index
$unwrappedLines[$cursor] += $line
# Then increment our index cursor and initalize the next string in the array
$unwrappedLines[++$cursor] = ""
}
}

If the file is small enough just read this in as one string and replace all the _newlines with nothing.
(Get-Content -Raw "c:\temp\test.txt") -replace "_`r`n"
-Raw works in 3.0. If you don't have that then Out-String to the rescue.
(Get-Content "c:\temp\test.txt" | Out-String) -replace "_`r`n"
Just need to find any underscore that is followed by a new line and remove it.

Related

Remove the need to use out-file only to import the file immediately using PowerShell just to convert the base type

I am attempting to turn the file below into one that contains no comments '#', no blank lines, no unneeded spaces, and only one entry per line. I'm unsure how to run the following code without the need to output the file and then reimport it. There should be code that doesn't require that step but I can't find it. The way I wrote my script also doesn't look right to me even though it works. As if there was a more elegant way of doing what I'm attempting but I just don't see it.
Before File Change: TNSNames.ora
#Created 9_27_16
#Updated 8_30_19
AAAA.world=(DESCRIPTION =(ADDRESS_LIST =
(ADDRESS =
(COMMUNITY = tcp.world)
(PROTOCOL = TCP)
(Host = www.url1111.com)
(Port = 1111)
)
)
(CONNECT_DATA = (SID = SID1111)
)
)
#Created 9_27_16
BBBB.world=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=url2222.COM)(Port=2222))(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=url22222.COM)(Port=22222)))(CONNECT_DATA=(SID=SID2222)))
CCCC.world=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(Host=url3333.COM)(Port=3333))(CONNECT_DATA=(SID=SID3333)))
DDDD.url =(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=URL4444 )(Port=4444))(ADDRESS=(COMMUNITY=TCP.world)(PROTOCOL=TCP)(Host=URL44444 )(Port=44444)))(CONNECT_DATA=(SID=SID4444 )(GLOBAL_NAME=ASDF.URL)))
#Created 9_27_16
#Updated 8_30_19
After File Change:
AAAA.world=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=www.url1111.com)(Port=1111)))(CONNECT_DATA=(SID=SID1111)))
BBBB.world=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=url2222.COM)(Port=2222))(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=url22222.COM)(Port=22222)))(CONNECT_DATA=(SID=SID2222)))
CCCC.world=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(Host=url3333.COM)(Port=3333))(CONNECT_DATA=(SID=SID3333)))
DDDD.url=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(COMMUNITY=tcp.world)(PROTOCOL=TCP)(Host=URL4444)(Port=4444))(ADDRESS=(COMMUNITY=TCP.world)(PROTOCOL=TCP)(Host=URL44444)(Port=44444)))(CONNECT_DATA=(SID=SID4444)(GLOBAL_NAME=ASDF.URL)))
Code:
# Get the file
[System.IO.FileInfo] $File = 'C:\temp\TNSNames.ora'
[string] $data = (Get-Content $File.FullName | Where-Object { !$_.StartsWith('#') }).ToUpper()
# Convert the data. This part is where any (CONNECT_DATA entry ends up on it's own line.
$Results = $data.Replace(" ", "").Replace("`t", "").Replace(")))", ")))`n")
# Convert $Results from BaseType of System.Object to System.Array
$Path = '.\.vscode\StringResults.txt'
$Results | Out-File -FilePath $Path
$Results = Get-Content $Path
# Find all lines that start with '(CONNECT_DATA'
for ($i = 0; $i -lt $Results.Length - 1; $i++) {
if ($Results[$i + 1].StartsWith("(CONNECT_DATA")) {
# Add the '(CONNECT_DATA' line to the previous line
$Results[$i] = $Results[$i] + $Results[$i + 1]
# Blank out the '(CONNECT_DATA' line
$Results[$i + 1] = ''
}
}
# Remove all blank lines
$FinalForm = $null
foreach ($Line in $Results) {
if ($Line -ne "") {
$FinalForm += "$Line`n"
}
}
$FinalForm
So the crux of your problem is that you have declared $data as a [string] which is fine because probably some of your replace operations work better as a single string. Its just that $Results also then ends up being a string so when you try to index into $Results near the bottom these operations fail. You can however easily turn your $Results variable into a string array using the -split operator this would eliminate the need to save the string to disk and import back in just to accomplish the same. See comments below.
# Get the file
[System.IO.FileInfo] $File = 'C:\temp\TNSNames.ora'
[string] $data = (Get-Content $File.FullName | Where-Object { !$_.StartsWith('#') }).ToUpper()
# Convert the data. This part is where any (CONNECT_DATA entry ends up on it's own line.
$Results = $data.Replace(' ', '').Replace("`t", '').Replace(')))', ")))`n")
# You do not need to do this next section. Essentially this is just saving your multiline string
# to a file and then using Get-Content to read it back in as a string array
# Convert $Results from BaseType of System.Object to System.Array
# $Path = 'c:\temp\StringResults.txt'
# $Results | Out-File -FilePath $Path
# $Results = Get-Content $Path
# Instead split your $Results string into multiple lines using -split
# this will do the same thing as above without writing to file
$Results = $Results -split "\r?\n"
# Find all lines that start with '(CONNECT_DATA'
for ($i = 0; $i -lt $Results.Length - 1; $i++) {
if ($Results[$i + 1].StartsWith('(CONNECT_DATA')) {
# Add the '(CONNECT_DATA' line to the previous line
$Results[$i] = $Results[$i] + $Results[$i + 1]
# Blank out the '(CONNECT_DATA' line
$Results[$i + 1] = ''
}
}
# Remove all blank lines
$FinalForm = $null
foreach ($Line in $Results) {
if ($Line -ne '') {
$FinalForm += "$Line`n"
}
}
$FinalForm
Also, for fun, try this out
((Get-Content 'C:\temp\tnsnames.ora' |
Where-Object {!$_.StartsWith('#') -and ![string]::IsNullOrWhiteSpace($_)}) -join '' -replace '\s' -replace '\)\s?\)\s?\)', ")))`n" -replace '\r?\n\(Connect_data','(connect_data').ToUpper()

Remove starting content from each line of file

I need to remove the time stamp details from each line which is present at the start. How can I achieve?
I tried to use
regex = "[0-9]{1,2}/[0-9]{1,2}/[0-9]{1,4} [0-9]{1,2}:[0-9]{1,2}:[0-9]
{1,2}):[0-9]{1,3}"
and my approach was like
$Check = (Get-Content -Path .\file.txt|Select-Object -last 3|Out-String)
$Check = $Check -replace('$regex','')
The lines in text file.txt would be like :
[06/13/19 08:52:58] The new world
[06/13/19 08:52:58] Computing
[06/13/19 08:52:58] Technology
and alternate method would be to use the -split string operator and split on ] [that is close-bracket & space]. something like this ...
('[06/13/19 08:52:58] The new world' -split '] ')[1].Trim()
output = The new world
if it is a static stamp just use substring.
$start = Get-Content -Path C:\Windows\Panther\setupact.log -First 1
$start.Substring(11,8)
This for example only extracs the timestamp.
in your case:
$start = "[06/13/19 08:52:58] Technology"
$start.Substring(20)
this removes it.
Hope it helps! BR
Edit (see comment):
$content = Get-Content .\Desktop\test.txt
$newLine = ""
foreach($line in $content){
if($line.IndexOf("[") -eq 0){
$newLine += $line.Substring(20) + " "
$prev = $true
}
else{
if($line -ne ""){
Write-Host $newLine
$newLine = ""
}
write-host $line
$prev = $false
}
}

Keep lines from line X then delete others if does not contain pattern

I am trying to manipulate a textfile. I want it to keep the first X numbers of lines and after that it should look for a string pattern. If a line contains the pattern it should be kept otherwise deleted.
I got both things to work separately but not together. It works to keep lines until X and remove the rest. And I got it to work to remove all lines except for lines with a pattern, but I can't get it to work for both together.
EDIT: here is the code:
$data = Get-Content test.md
$newdata = ""
$n = 0
Foreach ($line in $data) {
if ($n++ -ge 6) {
$newdata += $line | Where{$_ -match '\[R\]'}
} else {
$newdata += $line
}
$newdata += " `r`n"
}
$newdata > test2.md
The problem is the lines are still there as empty lines. But they should be completely deleted.
$data = Get-Content test.md
$newdata = ""
$n = 0
Foreach ($line in $data) {
if ($n++ -gt 6) {
if ($line -match '\[R\]') {
$newdata += $line + " `r`n"
}
} else {
$newdata += $line + " `r`n"
}
}
$newdata > test2.md
got it to work like that.
You could use
"test.md" | % {
Get-Content $_ -TotalCount 6
(Select-String -path $_ -match '\[R \]' -AllMatches).Line
} | Out-File test2.md -Encoding Ascii

Use .NET for fast read/write of large files

i am trying to search through a number of large files and replace parts of the text, but i keep running into errors.
i tried this, but sometimes i'll get an 'out of memory' error in powershell
#region The Setup
$file = "C:\temp\168MBfile.txt"
$hash = #{
ham = 'bacon'
toast = 'pancakes'
}
#endregion The Setup
$obj = [System.IO.StreamReader]$file
$contents = $obj.ReadToEnd()
$obj.Close()
foreach ($key in $hash.Keys) {
$contents = $contents -replace [regex]::Escape($key), $hash[$key]
}
try {
$obj = [System.IO.StreamWriter]$file
$obj.Write($contents)
} finally {
if ($obj -ne $null) {
$obj.Close()
}
}
then i tried this (in the ISE), but it crashes with a popup message (sorry, don't have the error on hand) and tries to restart the ISE
$arraylist = New-Object System.Collections.ArrayList
$obj = [System.IO.StreamReader]$file
while (!$obj.EndOfStream) {
$line = $obj.ReadLine()
foreach ($key in $hash.Keys) {
$line = $line -replace [regex]::Escape($key), $hash[$key]
}
[void]$arraylist.Add($line)
}
$obj.Close()
$arraylist
and finally, i came across something like this, but i'm not sure how to use it properly, and i am not even sure if i am going about this the right way.
$sourcestream = [System.IO.File]::Open($file)
$newstream = [System.IO.File]::Create($file)
$sourcestream.Stream.CopyTo($newstream)
$sourcestream.Close()
any advice would be greatly appreciated.
You can start with a readcount of 1000 and tweak it based on the performance you get:
get-content textfile -Readcount 1000 |
foreach-object {do something} |
set-content textfile
or
(get-content textfile -Readcount 1000) -replace 'something','withsomething' |
set-content textfile

Powershell search through two lines

I have following Input lines in my notepad file.
example 1 :
//UNION TEXT=firststring,FRIEND='ABC,Secondstring,ABAER'
example 2 :
//UNION TEXT=firststring,
// FRIEND='ABC,SecondString,ABAER'
Basically, one line can span over two or three lines. If last character is , then it is treated as continuation character.
In example 1 - Text is in one line.
In example 2 - same Text is in two lines.
In example 1, I can probably write below code. However, I do not know how to do this if 'Input text' spans over two or three lines based on continuation character ,
$result = Get-Content $file.fullName | ? { ($_ -match firststring) -and ($_ -match 'secondstring')}
I think I need a way so that I can search text in multipl lines with '-and' condition. something like that...
Thanks!
You could read the entire content of the file, join the continued lines, and then split the text line-wise:
$text = [System.IO.File]::ReadAllText("C:\path\to\your.txt")
$text -replace ",`r`n", "," -split "`r`n" | ...
# get the full content as one String
$content = Get-Content -Path $file.fullName -Raw
# join continued lines, split content and filter
$content -replace '(?<=,)\s*' -split '\r\n' -match 'firststring.+secondstring'
If file is large and you want to avoid loading entire file into memory you might want to use good old .NET ReadLine:
$reader = [System.IO.File]::OpenText("test.txt")
try {
$sb = New-Object -TypeName "System.Text.StringBuilder";
for(;;) {
$line = $reader.ReadLine()
if ($line -eq $null) { break }
if ($line.EndsWith(','))
{
[void]$sb.Append($line)
}
else
{
[void]$sb.Append($line)
# You have full line at this point.
# Call string match or whatever you find appropriate.
$fullLine = $sb.ToString()
Write-Host $fullLine
[void]$sb.Clear()
}
}
}
finally {
$reader.Close()
}
If file is not large (let's say < 1G) Ansgar Wiechers answer should do the trick.