Powershell - Read, Alter Line, Write new line over old in Text Document - powershell

I am reading in line by line of a text file. If I see a specific string, I locate the first and last of a specific character, use two substrings to create a smaller string, then replace the line in the text file.
The difficult part: I have the line of text stored in a variable, but cannot figure out how to write this new line over the old line in the text document.
Excuse the crude code - I have been testing things and only started playing with PowerShell a few hours ago.
foreach($line in [System.IO.File]::ReadLines("C:\BatchPractice\test.txt"))
{
Write-Output $line
if ($line.Contains("dsa")) {
Write-Output "TRUEEEEE"
}
$positionF = $line.IndexOf("\")+1
$positionL = $line.LastIndexOf("\")+1
$lengthT = $line.Length
Write-Output ($positionF)
Write-Output $positionL
Write-Output $lengthT
if($line.Contains("\")){
Write-Output "Start"
$combine = $line.Substring(0,$positionF-1) + $line.Substring($postionL,($lengthT-$positionL))
Write-Output $combine
$line1 = $line.Substring(0,$positionF-1)
$line2 = $line.Substring($positionL,($lengthT-$positionL))
$combined = $line1 + $line2
Write-Output $combined
Write-Output "Close"
}
}```

You can save the file as arrays in Get-Content and Set-Content:
$file=(Get-Content "C:\BatchPractice\test.txt")
Then you can edit it like arrays:
$file[LINE_NUMBER]="New line"
Where LINE_NUMBER is the line number starting from 0.
And then overwrite to file:
$file|Set-Content "C:\BatchPractice\test.txt"
You can implement this in code. Create a variable $i=0 and increment it at the end of loop. Here $i will be the line number at each iteration.
HTH

Based on your code it seems you want to take any line that contains 'dsa' and remove the contents after the first backslash up until the last backslash. If that's the case I'd recommend simplifying your code with regex. First I made a sample file since none was provided.
$tempfile = New-TemporaryFile
#'
abc\def\ghi\jkl
abc\dsa\ghi\jkl
zyx\vut\dsa\srq
zyx\vut\srq\pon
'# | Set-Content $tempfile -Encoding UTF8
Now we will read in all lines (unless this is a massive file)
$text = Get-Content $tempfile -Encoding UTF8
Next we'll make a regex object with the pattern we want to replace. The double backslash is to escape the backslash since it has meaning to regex.
$regex = [regex]'(?<=.+\\).+\\'
Now we will loop over every line, if it has dsa in it we will run the replace against it, otherwise we will output the line.
$text | ForEach-Object {
if($_.contains('dsa'))
{
$regex.Replace($_,'')
}
else
{
$_
}
} -OutVariable newtext
You'll see the output on the screen but it's also capture in $newtext variable. I recommend ensuring it is the output you are after prior to writing.
abc\def\ghi\jkl
abc\jkl
zyx\srq
zyx\vut\srq\pon
Once confirmed, simply write it back to the file.
$newtext | Set-Content $tempfile -Encoding UTF8
You can obviously combine the steps as well.
$text | ForEach-Object {
if($_.contains('dsa'))
{
$regex.Replace($_,'')
}
else
{
$_
}
} | Set-Content $tempfile -Encoding UTF8

Related

Re-assembling split file names with Powershell

I'm having trouble re-assembling certain filenames (and discarding the rest) from a text file. The filenames are split up (usually on three lines) and there is always a blank line after each filename. I only want to keep filenames that begin with OPEN or FOUR. An example is:
OPEN.492820.EXTR
A.STANDARD.38383
333
FOUR.383838.282.
STAND.848484.NOR
MAL.3939
CLOSE.3480384.ST
ANDARD.39393939.
838383
The output I'd like would be:
OPEN.492820.EXTRA.STANDARD.38383333
FOUR.383838.282.STAND.848484.NORMAL.3939
Thanks for any suggestions!
The following worked for me, you can give it a try.
See https://regex101.com/r/JuzXOb/1 for the Regex explanation.
$source = 'fullpath/to/inputfile.txt'
$destination = 'fullpath/to/resultfile.txt'
[regex]::Matches(
(Get-Content $source -Raw),
'(?msi)^(OPEN|FOUR)(.*?|\s*?)+([\r\n]$|\z)'
).Value.ForEach({ -join($_ -split '\r?\n').ForEach('Trim') }) |
Out-File $destination
For testing:
$txt = #'
OPEN.492820.EXTR
A.STANDARD.38383
333
FOUR.383838.282.
STAND.848484.NOR
MAL.3939
CLOSE.3480384.ST
ANDARD.39393939.
838383
OPEN.492820.EXTR
A.EXAMPLE123
FOUR.383838.282.
STAND.848484.123
ZXC
'#
[regex]::Matches(
$txt,
'(?msi)^(OPEN|FOUR)(.*?|\s*?)+([\r\n]$|\z)'
).Value.ForEach({ -join($_ -split '\r?\n').ForEach('Trim') })
Output:
OPEN.492820.EXTRA.STANDARD.38383333
FOUR.383838.282.STAND.848484.NORMAL.3939
OPEN.492820.EXTRA.EXAMPLE123
FOUR.383838.282.STAND.848484.123ZXC
Read the file one line at a time and keep concatenating them until you encounter a blank line, at which point you output the concatenated string and repeat until you reach the end of the file:
# this variable will keep track of the partial file names
$fileName = ''
# use a switch to read the file and process each line
switch -Regex -File ('path\to\file.txt') {
# when we see a blank line...
'^\s*$' {
# ... we output it if it starts with the right word
if($s -cmatch '^(OPEN|FOUR)'){ $fileName }
# and then start over
$fileName = ''
}
default {
# must be a non-blank line, concatenate it to the previous ones
$s += $_
}
}
# remember to check and output the last one
if($s -cmatch '^(OPEN|FOUR)'){
$fileName
}

Powershell - Count number of carriage returns line feed in .txt file

I have a large text file (output from SQL db) and I need to determine the row count. However, since the source SQL data itself contains carriage returns \r and line feeds \n (NEVER appearing together), the data for some rows spans multiple lines in the output .txt file. The Powershell I'm using below gives me the file line count which is greater than the actual SQL row count. So I need to modify the script to ignore the additional lines - one way of doing it might be just counting the number of times CRLF or \r\n occurs (TOGETHER) in the file and that should be the actual number of rows but I'm not sure how to do it.
Get-ChildItem "." |% {$n = $_; $c = 0; Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count }; "$n; $c"} > row_count.txt
I just learned myself that the Get-Content splits and streams each lines in a file by CR, CRLF, and LF sothat it can read data between operating systems interchangeably:
"1`r2`n3`r`n4" | Out-File .\Test.txt
(Get-Content .\Test.txt).Count
4
Reading the question again, I might have misunderstood your question.
In any case, if you want to split (count) on only a specific character combination:
CR
((Get-Content -Raw .\Test.txt).Trim() -Split '\r').Count
3
LF
((Get-Content -Raw .\Test.txt).Trim() -Split '\n').Count
3
CRLF
((Get-Content -Raw .\Test.txt).Trim() -Split '\r\n').Count # or: -Split [Environment]::NewLine
2
Note .Trim() method which removes the extra newline (white spaces) at the end of the file added by the Get-Content -Raw parameter.
Addendum
(Update based on the comment on the memory exception)
I am afraid that there is currently no other option then building your own StreamReader using the ReadBlock method and specifically split lines on a CRLF. I have opened a feature request for this issue: -NewLine Parameter to customize line separator for Get-Content
Get-Lines
A possible way to workaround the memory exception errors:
function Get-Lines {
[CmdletBinding()][OutputType([string])] param(
[Parameter(ValueFromPipeLine = $True)][string] $Filename,
[String] $NewLine = [Environment]::NewLine
)
Begin {
[Char[]] $Buffer = new-object Char[] 10
$Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList (Get-Item($Filename))
$Rest = '' # Note that a multiple character newline (as CRLF) could be split at the end of the buffer
}
Process {
While ($True) {
$Length = $Reader.ReadBlock($Buffer, 0, $Buffer.Length)
if (!$length) { Break }
$Split = ($Rest + [string]::new($Buffer[0..($Length - 1)])) -Split $NewLine
If ($Split.Count -gt 1) { $Split[0..($Split.Count - 2)] }
$Rest = $Split[-1]
}
}
End {
$Rest
}
}
Usage
To prevent the memory exceptions it is important that you do not assign the results to a variable or use brackets as this will stall the PowerShell PowerShell pipeline and store everything in memory.
$Count = 0
Get-Lines .\Test.txt | ForEach-Object { $Count++ }
$Count
The System.IO.StreamReader.ReadBlock solution that reads the file in fixed-size blocks and performs custom splitting into lines in iRon's helpful answer is the best choice, because it both avoids out-of-memory problems and performs well (by PowerShell standards).
If performance in terms of execution speed isn't paramount, you can take advantage of
Get-Content's -Delimiter parameter, which accepts a custom string to split the file content by:
# Outputs the count of CRLF-terminated lines.
(Get-Content largeFile.txt -Delimiter "`r`n" | Measure-Object).Count
Note that -Delimiter employs optional-terminator logic when splitting: that is, if the file content ends in the given delimiter string, no extra, empty element is reported at the end.
This is consistent with the default behavior, where a trailing newline in a file is considered an optional terminator that does not resulting in an additional, empty line getting reported.
However, in case a -Delimiter string that is unrelated to newline characters is used, a trailing newline is considered a final "line" (element).
A quick example:
# Create a test file without a trailing newline.
# Note the CR-only newline (`r) after 'line 1'
"line1`rrest of line1`r`nline2" | Set-Content -NoNewLine test1.txt
# Create another test file with the same content plus
# a trailing CRLF newline.
"line1`rrest of line1`r`nline2`r`n" | Set-Content -NoNewLine test2.txt
'test1.txt', 'test2.txt' | ForEach-Object {
"--- $_"
# Split by CRLF only and enclose the resulting lines in [...]
Get-Content $_ -Delimiter "`r`n" |
ForEach-Object { "[{0}]" -f ($_ -replace "`r", '`r') }
}
This yields:
--- test1.txt
[line1`rrest of line1]
[line2]
--- test2.txt
[line1`rrest of line1]
[line2]
As you can see, the two test files were processed identically, because the trailing CRLF newline was considered an optional terminator for the last line.

How to make changes to file content and save it to another file using powershell?

I want to do this
read the file
go through each line
if the line matches the pattern, do some changes with that line
save the content to another file
For now I use this script:
$file = [System.IO.File]::ReadLines("C:\path\to\some\file1.txt")
$output = "C:\path\to\some\file2.txt"
ForEach ($line in $file) {
if($line -match 'some_regex_expression') {
$line = $line.replace("some","great")
}
Out-File -append -filepath $output -inputobject $line
}
As you can see, here I write line by line. Is it possible to write the whole file at once ?
Good example is provided here :
(Get-Content c:\temp\test.txt) -replace '\[MYID\]', 'MyValue' | Set-Content c:\temp\test.txt
But my problem is that I have additional IF statement...
So, what could I do to improve my script ?
You could do it like that:
Get-Content -Path "C:\path\to\some\file1.txt" | foreach {
if($_ -match 'some_regex_expression') {
$_.replace("some","great")
}
else {
$_
}
} | Out-File -filepath "C:\path\to\some\file2.txt"
Get-Content reads a file line by line (array of strings) by default so you can just pipe it into a foreach loop, process each line within the loop and pipe the whole output into your file2.txt.
In this case Arrays or Array List(lists are better for large arrays) would be the most elegant solution. Simply add strings in array until ForEach loop ends. After that just flush array to a file.
This is Array List example
$file = [System.IO.File]::ReadLines("C:\path\to\some\file1.txt")
$output = "C:\path\to\some\file2.txt"
$outputData = New-Object System.Collections.ArrayList
ForEach ($line in $file) {
if($line -match 'some_regex_expression') {
$line = $line.replace("some","great")
}
$outputData.Add($line)
}
$outputData |Out-File $output
I think the if statement can be avoided in a lot of cases by using regular expression groups (e.g. (.*) and placeholders (e.g. $1, $2 etc.).
As in your example:
(Get-Content .\File1.txt) -Replace 'some(_regex_expression)', 'great$1' | Set-Content .\File2.txt
And for the good example" where [MYID\] might be somewhere inline:
(Get-Content c:\temp\test.txt) -Replace '^(.*)\[MYID\](.*)$', '$1MyValue$2' | Set-Content c:\temp\test.txt
(see also How to replace first and last part of each line with powershell)

Piping Replace to New File

$x = Get-Content($file)
if ($x -match("~")) {
$x -replace("~","~`n") | Out-File $file
}
This is the snippet of code I am using. I have debugged up until this point and the code isn't updating after I replace the character tilda ~ with itself and then create a new line. When I output it to the command window and comment out the | Out-File $file the code works fine. When I try to pipe the new result back into the original file the code doesn't "unwrap" the file.
The replacement works just fine. However, you're inserting just linefeed characters (LF, `n), not the combination of carriage-return and linefeed (CR-LF, `r`n) that Windows uses for encoding line breaks. Because of that you don't see line breaks when opening the file in Notepad. PowerShell accepts both LF and CR-LF as line break encoding, so you see correctly wrapped lines when you output the file there.
Change your code to this and you'll get the expected result:
(Get-Content $file) -replace '~', "~`r`n" | Set-Content $file
My mistake. I was calling reader method above.
ForEach ($file in $Path){
$Array = #()
$reader = new-object System.IO.StreamReader($file)
I needed an array to determine if the file needed to be unwrapped to begin with. If the contents took up 1 line it needed to be unwrapped. If not then it did not. I essentially used streamreader which is going to store each line as an element in an array. I forgot to close $reader before so we had a producer-consumer issue and thus Out-File could not override $file.
Fixed Snippet:
if($Array.length -eq 1){
$x = Get-Content($file)
if($x -match("~")){
$reader.close()
($x -replace("~","~`n")) | Out-File $file
}
}

Powershell search through two lines

I have following Input lines in my notepad file.
example 1 :
//UNION TEXT=firststring,FRIEND='ABC,Secondstring,ABAER'
example 2 :
//UNION TEXT=firststring,
// FRIEND='ABC,SecondString,ABAER'
Basically, one line can span over two or three lines. If last character is , then it is treated as continuation character.
In example 1 - Text is in one line.
In example 2 - same Text is in two lines.
In example 1, I can probably write below code. However, I do not know how to do this if 'Input text' spans over two or three lines based on continuation character ,
$result = Get-Content $file.fullName | ? { ($_ -match firststring) -and ($_ -match 'secondstring')}
I think I need a way so that I can search text in multipl lines with '-and' condition. something like that...
Thanks!
You could read the entire content of the file, join the continued lines, and then split the text line-wise:
$text = [System.IO.File]::ReadAllText("C:\path\to\your.txt")
$text -replace ",`r`n", "," -split "`r`n" | ...
# get the full content as one String
$content = Get-Content -Path $file.fullName -Raw
# join continued lines, split content and filter
$content -replace '(?<=,)\s*' -split '\r\n' -match 'firststring.+secondstring'
If file is large and you want to avoid loading entire file into memory you might want to use good old .NET ReadLine:
$reader = [System.IO.File]::OpenText("test.txt")
try {
$sb = New-Object -TypeName "System.Text.StringBuilder";
for(;;) {
$line = $reader.ReadLine()
if ($line -eq $null) { break }
if ($line.EndsWith(','))
{
[void]$sb.Append($line)
}
else
{
[void]$sb.Append($line)
# You have full line at this point.
# Call string match or whatever you find appropriate.
$fullLine = $sb.ToString()
Write-Host $fullLine
[void]$sb.Clear()
}
}
}
finally {
$reader.Close()
}
If file is not large (let's say < 1G) Ansgar Wiechers answer should do the trick.