I want to combine every other line from the input below. Here is the input.
ALPHA-FETOPROTEIN ROUTINE CH 0203 001 02/03/2023#10:45 LIVERF3
###-##-#### #######,#### In lab
ALPHA-FETOPROTEIN ROUTINE CH 0203 234 02/03/2023#11:05 LIVER
###-##-#### ########,######## In lab
ANION GAP STAT CH 0203 124 02/03/2023#11:06 DAY
###-##-#### ######,##### #### In lab
BASIC METABOLIC PANE ROUTINE CH 0203 001 02/03/2023#10:45 LIVERF3
###-##-#### #######,#### ###### In lab
This is the desired output
ALPHA-FETOPROTEIN ROUTINE CH 0203 001 02/03/2023#10:45 LIVERF3 ###-##-#### #######,#### In lab
ALPHA-FETOPROTEIN ROUTINE CH 0203 234 02/03/2023#11:05 LIVER ###-##-#### ########,######## In lab
ANION GAP STAT CH 0203 124 02/03/2023#11:06 DAY ###-##-#### ######,##### #### In lab
BASIC METABOLIC PANE ROUTINE CH 0203 001 02/03/2023#10:45 LIVERF3 ###-##-#### #######,#### ###### In lab
The code that I have tried is
for($i = 0; $i -lt $splitLines.Count; $i += 2){
$splitLines[$i,($i+1)] -join ' '
}
It came from Joining every two lines in Powershell output. But I can't seem to get it to work for me. I'm not well versed with powershell, but i'm at the mercy of what's available at work.
Edit: Here is the entire code that I am using as requested.
# SET VARIABLES
$inputfile = "C:\Users\Will\Desktop\testfile.txt"
$outputfile = "C:\Users\Will\Desktop\testfileformatted.txt"
$new_output = "C:\Users\Will\Desktop\new_formatted.txt"
# REMOVE EXTRA CHARACTERS
$remove_beginning_capture = "-------------------------------------------------------------------------------"
$remove_end_capture = "==============================================================================="
$remove_line = "------"
$remove_strings_with_spaces = " \d"
Get-Content $inputfile | Where-Object {$_ -notmatch $remove_beginning_capture} | Where-Object {$_ -notmatch $remove_end_capture} | Where-Object {$_ -notmatch $remove_line} | Where-Object {$_ -notmatch $remove_strings_with_spaces} | ? {$_.trim() -ne "" } | Set-Content $outputfile
# Measures line length for loop
$file_lines = gc $outputfile | Measure-Object
#Remove Whitespace
# $whitespace_removed = (Get-Content $outputfile -Raw) -replace '\s+', ' '| Set-Content -Path C:\Users\Will\Desktop\new_formatted.csv
# Combine every other line
$lines = Get-Content $outputfile -Raw
$newcontent = $lines.Replace("`n","")
Write-Host "Content: $newcontent"
$newcontent | Set-Content $new_output
for($i = 0; $i -lt $splitLines.Count; $i += 2){
$splitLines[$i,($i+1)] -join ' '
}
Just read two lines and then print one
$inputFilename = "c:\temp\test.txt"
$outputFilename = "c:\temp\test1.txt"
$reader = [System.IO.StreamReader]::new($inputFilename)
$writer = [System.IO.StreamWriter]::new($outputFilename)
while(($line = $reader.ReadLine()) -ne $null)
{
$secondLine = ""
if(!$reader.EndOfStream){ $secondLine = $reader.ReadLine() }
$writer.WriteLine($line + $secondLine)
}
$reader.Close()
$writer.Flush()
$writer.Close()
PowerShell-idiomatic solutions:
Use Get-Content with -ReadCount 2 in order to read the lines from your file in pairs, which allows you to process each pair in a ForEach-Object call, where the constituent lines can be joined to form a single output line.
Get-Content -ReadCount 2 yourFile.txt |
ForEach-Object { $_[0] + ' ' + $_[1].TrimStart() }
The above directly outputs the resulting lines (as the for command in your question does), causing them to print to the display by default.
Pipe to Set-Content to save the output to a file:
Get-Content -ReadCount 2 yourFile.txt |
ForEach-Object { $_[0] + ' ' + $_[1].TrimStart() } |
Set-Content yourOutputFile.txt
Performance notes:
Unfortunately (as of PowerShell 7.3.2), Get-Content is quite slow by default - see GitHub issue #7537, and the performance of ForEach-Object and Where-Object could be improved too - see GitHub issue #10982.
At the expense of collecting all inputs and outputs in memory first, you can noticeably improve the performance with the following variation, which avoids the ForEach-Object cmdlet in favor of the intrinsic .ForEach() method, and, instead of piping to Set-Content, passes all output lines via the -Value parameter:
Set-Content $tempOutFile -Value (
(Get-Content -ReadCount 2 $tempInFile).ForEach({ $_[0] + ' ' + $_[1].TrimStart() })
)
Read on for even faster alternatives, but remember that optimizations are only worth undertaking if actually needed - if the first PowerShell-idiomatic solution above is fast enough in practice, it is worth using for its conceptual elegance and concision.
See this Gist for benchmarks that compare the relative performance of the solutions in this answer as well as that of the solution from jdweng's .NET API-based answer.
An better-performing alternative is to use a switch statement with the -File parameter to process files line by line:
$i = 1
switch -File yourFile.txt {
default {
if ($i++ % 2) { $firstLineInPair = $_ }
else { $firstLineInPair + ' ' + $_.TrimStart() }
}
}
Helper index variable $i and the modulo operation (%) are simply used to identify which line is the start of a (new) pair, and which one is its second half.
The switch statement is itself streaming, but it cannot be used as-is as pipeline input. By enclosing it in & { ... }, it can, but that forfeits some of the performance benefits, making it only marginally faster than the optimized Get-Content -ReadCount 2 solution:
& {
$i = 1
switch -File yourFile.txt {
default {
if ($i++ % 2) { $firstLineInPair = $_ }
else { $firstLineInPair + ' ' + $_.TrimStart() }
}
}
} | Set-Content yourOutputFile.txt
For the best performance when writing to a file, use Set-Content $outFile -Value $(...), albeit at the expense of collecting all output lines in memory first:
Set-Content yourOutputFile.txt -Value $(
$i = 1
switch -File yourFile.txt {
default {
if ($i++ % 2) { $firstLineInPair = $_ }
else { $firstLineInPair + ' ' + $_.TrimStart() }
}
}
)
The fastest and most concise solution is to use a regex-based approach, which reads the entire file up front:
(Get-Content -Raw yourFile.txt) -replace '(.+)\r?\n(?: *)(.+\r?\n)', '$1 $2'
Note:
The assumption is that all lines are paired, and that the last line has a trailing newline.
The -replace operation matches two consecutive lines, and joins them together with a space, ignoring leading spaces on the second line. For a detailed explanation of the regex and the ability to interact with it, see this regex101.com page.
To save the output to a file, you can pipe directly to Set-Content:
(Get-Content -Raw yourFile.txt) -replace '(.+)\r?\n(?: *)(.+\r?\n)', '$1 $2' |
Set-Content yourOutputFile.txt
In this case, because the pipeline input to Set-Content is provided by an expression that doesn't involve for-every-input-line calls to script blocks ({ ... }) (as the switch solution requires), there is virtually no slowdown resulting from use of the pipeline (whose use is generally preferable for conceptual elegance and concision).
As for what you tried:
The $splitLines-based solution in your question is predicated on having assigned all lines of the input file to this self-chosen variable as an array, which your code does not do.
While you could fill variable $splitLines with an array of lines from your input file with $splitLines = Get-Content yourFile.txt, given that Get-Content reads text files line by line by default, the switch-based line-by-line solution is more efficient and streams its results (which - if saved to a file - keeps memory usage constant, which matters with large input sets (though rarely with text files)).
A performance tip when reading all lines at once into an array with Get-Content: use -ReadCount 0, which greatly speeds up the operation:
$splitLines = Get-Content -ReadCount 0 yourFile.txt
Related
I have a large text file (output from SQL db) and I need to determine the row count. However, since the source SQL data itself contains carriage returns \r and line feeds \n (NEVER appearing together), the data for some rows spans multiple lines in the output .txt file. The Powershell I'm using below gives me the file line count which is greater than the actual SQL row count. So I need to modify the script to ignore the additional lines - one way of doing it might be just counting the number of times CRLF or \r\n occurs (TOGETHER) in the file and that should be the actual number of rows but I'm not sure how to do it.
Get-ChildItem "." |% {$n = $_; $c = 0; Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count }; "$n; $c"} > row_count.txt
I just learned myself that the Get-Content splits and streams each lines in a file by CR, CRLF, and LF sothat it can read data between operating systems interchangeably:
"1`r2`n3`r`n4" | Out-File .\Test.txt
(Get-Content .\Test.txt).Count
4
Reading the question again, I might have misunderstood your question.
In any case, if you want to split (count) on only a specific character combination:
CR
((Get-Content -Raw .\Test.txt).Trim() -Split '\r').Count
3
LF
((Get-Content -Raw .\Test.txt).Trim() -Split '\n').Count
3
CRLF
((Get-Content -Raw .\Test.txt).Trim() -Split '\r\n').Count # or: -Split [Environment]::NewLine
2
Note .Trim() method which removes the extra newline (white spaces) at the end of the file added by the Get-Content -Raw parameter.
Addendum
(Update based on the comment on the memory exception)
I am afraid that there is currently no other option then building your own StreamReader using the ReadBlock method and specifically split lines on a CRLF. I have opened a feature request for this issue: -NewLine Parameter to customize line separator for Get-Content
Get-Lines
A possible way to workaround the memory exception errors:
function Get-Lines {
[CmdletBinding()][OutputType([string])] param(
[Parameter(ValueFromPipeLine = $True)][string] $Filename,
[String] $NewLine = [Environment]::NewLine
)
Begin {
[Char[]] $Buffer = new-object Char[] 10
$Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList (Get-Item($Filename))
$Rest = '' # Note that a multiple character newline (as CRLF) could be split at the end of the buffer
}
Process {
While ($True) {
$Length = $Reader.ReadBlock($Buffer, 0, $Buffer.Length)
if (!$length) { Break }
$Split = ($Rest + [string]::new($Buffer[0..($Length - 1)])) -Split $NewLine
If ($Split.Count -gt 1) { $Split[0..($Split.Count - 2)] }
$Rest = $Split[-1]
}
}
End {
$Rest
}
}
Usage
To prevent the memory exceptions it is important that you do not assign the results to a variable or use brackets as this will stall the PowerShell PowerShell pipeline and store everything in memory.
$Count = 0
Get-Lines .\Test.txt | ForEach-Object { $Count++ }
$Count
The System.IO.StreamReader.ReadBlock solution that reads the file in fixed-size blocks and performs custom splitting into lines in iRon's helpful answer is the best choice, because it both avoids out-of-memory problems and performs well (by PowerShell standards).
If performance in terms of execution speed isn't paramount, you can take advantage of
Get-Content's -Delimiter parameter, which accepts a custom string to split the file content by:
# Outputs the count of CRLF-terminated lines.
(Get-Content largeFile.txt -Delimiter "`r`n" | Measure-Object).Count
Note that -Delimiter employs optional-terminator logic when splitting: that is, if the file content ends in the given delimiter string, no extra, empty element is reported at the end.
This is consistent with the default behavior, where a trailing newline in a file is considered an optional terminator that does not resulting in an additional, empty line getting reported.
However, in case a -Delimiter string that is unrelated to newline characters is used, a trailing newline is considered a final "line" (element).
A quick example:
# Create a test file without a trailing newline.
# Note the CR-only newline (`r) after 'line 1'
"line1`rrest of line1`r`nline2" | Set-Content -NoNewLine test1.txt
# Create another test file with the same content plus
# a trailing CRLF newline.
"line1`rrest of line1`r`nline2`r`n" | Set-Content -NoNewLine test2.txt
'test1.txt', 'test2.txt' | ForEach-Object {
"--- $_"
# Split by CRLF only and enclose the resulting lines in [...]
Get-Content $_ -Delimiter "`r`n" |
ForEach-Object { "[{0}]" -f ($_ -replace "`r", '`r') }
}
This yields:
--- test1.txt
[line1`rrest of line1]
[line2]
--- test2.txt
[line1`rrest of line1]
[line2]
As you can see, the two test files were processed identically, because the trailing CRLF newline was considered an optional terminator for the last line.
$ready = Read-Host "How many you want?: "
$i = 0
do{
(-join(1..12 | ForEach {((65..90)+(97..122)+(".") | % {[char]$_})+(0..9)+(".") | Get-Random}))
$i++
} until ($i -match $ready) Out-File C:/numbers.csv -Append
If I give a value of 10 to the script - it will generate 10 random numbers and shows it on pshell. It even generates new file called numbers.csv. However, it does not add the generated output to the file. Why is that?
Your Out-File C:/numbers.csv -Append call is a completely separate statement from your do loop, and an Out-File call without any input simply creates an empty file.[1]
You need to chain (connect) commands with | in order to make them run in a pipeline.
However, with a statement such as as a do { ... } until loop, this won't work as-is, but you can convert such a statement to a command that you can use as part of a pipeline by enclosing it in a script block ({ ... }) and invoking it with &, the call operator (to run in a child scope), or ., the member-access operator (to run directly in the caller's scope):
[int] $ready = Read-Host "How many you want?"
$i = 0
& {
do{
-join (1..12 | foreach {
(65..90 + 97..122 + '.' | % { [char] $_ }) +(0..9) + '.' | Get-Random
})
$i++
} until ($i -eq $ready)
} | Out-File C:/numbers.csv -Append
Note the [int] type constraint to convert the Read-Host output, which is always a string, to a number, and the use of the -eq operator rather than the text- and regex-based -match operator in the until condition; also, unnecessary grouping with (...) has been removed.
Note: An alternative to the use of a script block with either the & or . operator is to use $(...), the subexpression operator, as shown in MikeM's helpful answer. The difference between the two approaches is that the former streams its output to the pipeline - i.e., outputs objects one by one - whereas $(...) invariably collects all output in memory, up front.
For smallish input sets this won't make much of a difference, but the in-memory collection that $(...) performs can become problematic with large input sets, so the & { ... } / . { ... } approach is generally preferable.
Arno van Boven' answer shows a simpler alternative to your do ... until loop based on a for loop.
Combining a foreach loop with .., the range operator, is even more concise and expressive (and the cost of the array construction is usually negligible and overall still amounts to noticeably faster execution):
[int] $ready = Read-Host "How many you want?"
& {
foreach ($i in 1..$ready) {
-join (1..12 | foreach {
([char[]] (65..90 + 97..122)) + 0..9 + '.' | Get-Random
})
}
} | Out-File C:/numbers.csv -Append
The above also shows a simplification of the original command via a [char[]] cast that directly converts an array of code points to an array of characters.
In PowerShell [Core] 7+, you could further simplify by taking advantage of Get-Random's -Count parameter:
[int] $ready = Read-Host "How many you want?"
& {
foreach ($i in 1..$ready) {
-join (
([char[]] (65..90 + 97..122)) + 0..9 + '.' | Get-Random -Count 12
)
}
} | Out-File C:/numbers.csv -Append
And, finally, you could have avoided a statement for looping altogether, and used the ForEach-Object cmdlet instead (whose built-in alias, perhaps confusingly, is also foreach, but there'a also %), as you're already doing inside your loop (1..12 | foreach ...):
[int] $ready = Read-Host "How many you want?"
1..$ready | ForEach-Object {
-join (1..12 | ForEach-Object {
([char[]] (65..90 + 97..122)) + 0..9 + '.' | Get-Random
})
} | Out-File C:/numbers.csv -Append
[1] In Windows PowerShell, Out-File uses UTF-16LE ("Unicode") encoding by default, so even a conceptually empty file still contains 2 bytes, namely the UTF-16LE BOM. In PowerShell [Core] v6+, BOM-less UTF-8 is the default across all cmdlets, so there you'll truly get an empty (0 bytes) file.
Another way is to wrap the loop in a sub-expression and pipe it:
$ready = Read-Host "How many you want?: "
$i = 0
$(do{
(-join(1..12 | ForEach {((65..90)+(97..122)+(".") | % {[char]$_})+(0..9)+(".") | Get-Random}))
$i++
} until ($i -match $ready)) | Out-File C:/numbers.csv -Append
I personally avoid Do loops when I can, because I find them hard to read. Combining the two previous answers, I'd write it like this, because I find it easier to tell what is going on. Using a for loop instead, every line becomes its own self-contained piece of logic.
[int]$amount = Read-Host "How many you want?: "
& {
for ($i = 0; $i -lt $amount; $i++) {
-join(1..12 | foreach {((65..90)+(97..122)+(".") | foreach {[char]$_})+(0..9)+(".") | Get-Random})
}
} | Out-File C:\numbers.csv -Append
(Please do not accept this as an answer, this is just showing another way of doing it)
How can I get the sum of a file from a substring and placing the sum on a specific position (different line) using powershell if have the following conditions:
Get the sum of the numbers from position 3 to 13 of a line that is starting with a character D. Place the sum on position 10 to 14 on the line that starts with the S
So for example, if i have this file:
F123trial text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S xxxxx
I want to get the sum of 38.95, 18.95 and 18.95 and then place the sum on position xxxxx under the line that starts with the S.
PowerShell's switch statement has powerful, but little-known features that allow you to iterate over the lines of a file (-file) and match lines by regular expressions (-regex).
Not only is switch -file convenient, it is also much faster than using cmdlets in a pipeline (see bottom section).
[double] $sum = 0
switch -regex -file file.txt {
# Note: The string to the left of each script block below ({ ... }),
# e.g., '^D', is the regex to match each line against.
# Inside the script blocks, $_ refers to the input line at hand.
# Extract number, add to sum, output the line.
'^D' { $sum += $_.Substring(2, 11); $_; continue }
# Summary line: place sum at character position 10, with 0-padding
# Note: `-replace ',', '.'` is only needed if your culture uses "," as the
# decimal mark.
'^S' { $_.Substring(0, 9) + '{0:000000000000000.00}' -f $sum -replace ',', '.'; continue }
# All other lines: pass them through.
default { $_ }
}
Note:
continue in the script blocks short-circuits further matching for the line at hand; by contrast, if you used break, no further lines would be processed.
Based on a later comment, I'm assuming you want an 18-character 0-left-padded number on the S line at character position 10.
With your sample file, the above yields:
F123trial text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S 000000000000076.85
Optional reading: Comparing the performance of switch -file ... to Get-Content ... | ForEach-Object ...
Running the following test script:
& {
# Create a sample file with 100K lines.
1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
(Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds,
(Measure-Command { get-content $tmpFile | % { $_ } }).TotalSeconds
Remove-Item $tmpFile
}
yields the following timings on my machine, for instance (the absolute numbers aren't important, but their ratio should give you a sense):
0.0578924 # switch -file
6.0417638 # Get-Content | ForEach-Object
That is, the pipeline-based solution is about 100 (!) times slower than the switch -file solution.
Digging deeper:
Frode F. points out that Get-Content is slow with large files - though its convenience makes it a popular choice - and mentions using the .NET Framework directly as an alternative:
Using [System.IO.File]::ReadAllLines(); however, given that it reads the entire file into memory, that is only an option with smallish files.
Using [System.IO.StreamReader]'s ReadLine() method in a loop.
However, use of the pipeline in itself, irrespective of the specific cmdlets used, introduces overhead. When performance matters - but only then - you should avoid it.
Here's an updated test that includes commands that use the .NET Framework methods, with and without the pipeline (the use of intrinsic .ForEach() method requires PSv4+):
& {
# Create a sample file with 100K lines.
1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
(Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds
(Measure-Command { foreach ($line in [IO.File]::ReadLines((Convert-Path $tmpFile))) { $line } }).TotalSeconds
(Measure-Command {
$sr = [IO.StreamReader] (Convert-Path $tmpFile)
while(-not $sr.EndOfStream) { $sr.ReadLine() }
$sr.Close()
}).TotalSeconds
(Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)).ForEach({ $_ }) }).TotalSeconds
(Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)) | % { $_ } }).TotalSeconds
(Measure-Command { Get-Content $tmpFile | % { $_ } }).TotalSeconds
Remove-Item $tmpFile
}
Sample results, from fastest to slowest:
0.0124441 # switch -file
0.0365348 # [System.IO.File]::ReadLine() in foreach loop
0.0481214 # [System.IO.StreamReader] in a loop
0.1614621 # [System.IO.File]::ReadAllText() with .ForEach() method
0.2745749 # (pipeline) [System.IO.File]::ReadAllText() with ForEach-Object
0.5925222 # (pipeline) Get-Content with ForEach-Object
switch -file is the fastest by a factor of around 3, followed by the no-pipeline .NET solutions; using .ForEach() adds another factor of 3.
Simply introducing the pipeline (ForEach-Object instead of .ForEach()) adds another factor of 2; finally, using the pipeline with Get-Content and ForEach-Object adds another factor of 2.
You could try:
-match to find the lines using regex-pattern
The .NET string-method Substring() to extract the values from the "D"-lines
Measure-Object -Sum to calculate the sum
-replace to insert the value (searches using regex-pattern).
Ex:
$text = Get-Content -Path file.txt
$total = $text -match '^D' |
#Foreach "D"-line, extract the value and cast to double (to be able to sum it)
ForEach-Object { $_.Substring(2,11) -as [double] } |
#Measure the sum
Measure-Object -Sum | Select-Object -ExpandProperty Sum
$text | ForEach-Object {
if($_ -match '^S') {
#Line starts with S -> Insert sum
$_.SubString(0,(17-$total.Length)) + $total + $_.SubString(17)
} else {
#Not "S"-line -> output original content
$_
}
} | Set-Content -Path file.txt
I need to only search the 1st line and last line in a text file to find a "-" and remove it.
How can I do it?
I tried select-string, but I don't know to find the 1st and last line and only remove "-" from there.
Here is what the text file looks like:
% 01-A247M15 G70
N0001 G30 G17 X-100 Y-100 Z0
N0002 G31 G90 X100 Y100 Z45
N0003 ; --PART NO.: NC-HON.PHX01.COVER-SHOE.DET-1000.050
N0004 ; --TOOL: 8.55 X .3937
N0005 ;
N0006 % 01-A247M15 G70
Something like this?
$1 = Get-Content C:\work\test\01.I
$1 | select-object -index 0, ($1.count-1)
Ok, so after looking at this for a while, I decided there had to be a way to do this with a one liner. Here it is:
(gc "c:\myfile.txt") | % -Begin {$test = (gc "c:\myfile.txt" | select -first 1 -last 1)} -Process {if ( $_ -eq $test[0] -or $_ -eq $test[-1] ) { $_ -replace "-" } else { $_ }} | Set-Content "c:\myfile.txt"
Here is a breakdown of what this is doing:
First, the aliases for those now familiar. I only put them in because the command is long enough as it is, so this helps keep things manageable:
gc means Get-Content
% means Foreach
$_ is for the current pipeline value (this isn't an alias, but I thought I would define it since you said you were new)
Ok, now here is what is happening in this:
(gc "c:\myfile.txt") | --> Gets the content of c:\myfile.txt and sends it down the line
% --> Does a foreach loop (goes through each item in the pipeline individually)
-Begin {$test = (gc "c:\myfile.txt" | select -first 1 -last 1)} --> This is a begin block, it runs everything here before it goes onto the pipeline stuff. It is loading the first and last line of c:\myfile.txt into an array so we can check for first and last items
-Process {if ( $_ -eq $test[0] -or $_ -eq $test[-1] ) --> This runs a check on each item in the pipeline, checking if it's the first or the last item in the file
{ $_ -replace "-" } else { $_ } --> if it's the first or last, it does the replacement, if it's not, it just leaves it alone
| Set-Content "c:\myfile.txt" --> This puts the new values back into the file.
Please see the following sites for more information on each of these items:
Get-Content uses
Get-Content definition
Foreach
The Pipeline
Begin and Process part of the Foreach (this are usually for custom function, but they work in the foreach loop as well)
If ... else statements
Set-Content
So I was thinking about what if you wanted to do this to many files, or wanted to do this often. I decided to make a function that does what you are asking. Here is the function:
function Replace-FirstLast {
[CmdletBinding()]
param(
[Parameter( `
Position=0, `
Mandatory=$true)]
[String]$File,
[Parameter( `
Position=1, `
Mandatory=$true)]
[ValidateNotNull()]
[regex]$Regex,
[Parameter( `
position=2, `
Mandatory=$false)]
[string]$ReplaceWith=""
)
Begin {
$lines = Get-Content $File
} #end begin
Process {
foreach ($line in $lines) {
if ( $line -eq $lines[0] ) {
$lines[0] = $line -replace $Regex,$ReplaceWith
} #end if
if ( $line -eq $lines[-1] ) {
$lines[-1] = $line -replace $Regex,$ReplaceWith
}
} #end foreach
}#End process
end {
$lines | Set-Content $File
}#end end
} #end function
This will create a command called Replace-FirstLast. It would be called like this:
Replace-FirstLast -File "C:\myfiles.txt" -Regex "-" -ReplaceWith "NewText"
The -Replacewith is optional, if it is blank it will just remove (default value of ""). The -Regex is looking for a regular expression to match your command. For information on placing this into your profile check this article
Please note: If you file is very large (several GBs), this isn't the best solution. This would cause the whole file to live in memory, which could potentially cause other issues.
try:
$txt = get-content c:\myfile.txt
$txt[0] = $txt[0] -replace '-'
$txt[$txt.length - 1 ] = $txt[$txt.length - 1 ] -replace '-'
$txt | set-content c:\myfile.txt
You can use the select-object cmdlet to help you with this, since get-content basically spits out a text file as one huge array.
Thus, you can do something like this
get-content "path_to_my_awesome_file" | select -first 1 -last 1
To remove the dash after that, you can use the -Replace switch to find the dash and remove it. This is better than using System.String.Replace(...) method because it can match regex statements and replace whole arrays of strings too!
That would look like:
# gc = Get-Content. The parens tell Powershell to do whatever's inside of it
# then treat it like a variable.
(gc "path_to_my_awesome_file" | select -first 1 -last 1) -Replace '-',''
If your file is very large you might not want to read the whole file to get the last line. gc -Tail will get the last line very quickly for you.
function GetFirstAndLastLine($path){
return New-Object PSObject -Property #{
First = Get-Content $path -TotalCount 1
Last = Get-Content $path -Tail 1
}
}
GetFirstAndLastLine "u_ex150417.log"
I tried this on a 20 gb log file and it returned immediately. Reading the file takes hours.
You will still need to read the file if you want to keep all excising content and you want only to remove from the end. Using the -Tail is a quick way to check if it is there.
I hope it helps.
A cleaner answer to the above:
$Line_number_were_on = 0
$Awesome_file = Get-Content "path_to_ridiculously_excellent_file" | %{
$Line = $_
if ($Line_number_were_on -eq $Awesome_file.Length)
{ $Line -Replace '-','' }
else
{ $Line } ;
$Line_number_were_on++
}
I like one-liners, but I find that readability tends to suffer sometimes when I put terseness over function. If what you're doing is going to be part of a script that other people will be reading/maintaining, readability might be something to consider.
Following Nick's answer: I do need to do this on all text files in the directory tree and this is what I'm using now:
Get-ChildItem -Path "c:\work\test" -Filter *.i | where { !$_.PSIsContainer } | % {
$txt = Get-Content $_.FullName;
$txt[0] = $txt[0] -replace '-';
$txt[$txt.length - 1 ] = $txt[$txt.length - 1 ] -replace '-';
$txt | Set-Content $_.FullName
}
and it looks like it's working well now.
Simple process:
Replace $file.txt with your filename
Get-Content $file_txt | Select-Object -last 1
I was recently searching for comments in the last line of .bat files. It seems to mess up the error code of previous commands. I found this useful for searching for a pattern in the last line of files. Pspath is a hidden property that get-content outputs. If I used select-string, I would lose the filename. *.bat gets passed as -filter for speed.
get-childitem -recurse . *.bat | get-content -tail 1 | where { $_ -match 'rem' } |
select pspath
PSPath
------
Microsoft.PowerShell.Core\FileSystem::C:\users\js\foo\file.bat
IF there is one file for example test.config , this file contain work "WARN" between line 140 and 170 , there are other lines where "WARN" word is there , but I want to replace "WARN" between line 140 and 170 with word "DEBUG", and keep the remaining text of the file same and when saved the "WARN" is replaced by "DEBUG" between only lines 140 and 170 . remaining all text is unaffected.
Look at $_.ReadCount which will help. Just as a example I replace only rows 10-15.
$content = Get-Content c:\test.txt
$content |
ForEach-Object {
if ($_.ReadCount -ge 10 -and $_.ReadCount -le 15) {
$_ -replace '\w+','replaced'
} else {
$_
}
} |
Set-Content c:\test.txt
After that, the file will contain:
1
2
3
4
5
6
7
8
9
replaced
replaced
replaced
replaced
replaced
replaced
16
17
18
19
20
2 Lines:
$FileContent = Get-Content "C:\Some\Path\textfile.txt"
$FileContent | % { If ($_.ReadCount -ge 140 -and $_.ReadCount -le 170) {$_ -Replace "WARN","DEBUG"} Else {$_} } | Set-Content -Path "C:\Some\Path\textfile.txt"
Description:
Write content of text file to array "$FileContent"
Pipe $FileContent array to For-EachObject cmdlet "%"
For each item in array, check Line number ($_.ReadCount)
If Line number 140-170, Replace WARN with DEBUG; otherwise write line unmodified.
NOTE: You MUST add the "Else {$_}". Otherwise the text file will only contain the modified lines.
Set-Content to write the content to text file
Using array slicing:
$content = Get-Content c:\test.txt
$out = #()
$out += $content[0..139]
$out += $content[140..168] -replace "warn","DEBUG"
$out += $content[169..($content.count -1)]
$out | out-file out.txt
This is the test file
text
text
DEBUG
DEBUG
TEXT
--
PS:\ gc .\stuff1.txt |% { [system.text.regularexpressions.regex]::replace($_,"WARN","DEBUG") } > out.txt
Out.txt look like this
text
text
DEBUG
DEBUG
TEXT
Might be trivial but it does the job:
$content = gc "D:\posh\stack\test.txt"
$start=139
$end=169
$content | % {$i=0;$lines=#();}{
if($i -ge $start -and $i -le $end){
$lines+=$_ -replace 'WARN', 'DEBUG'
}
else
{
$lines+=$_
}
$i+=1
}{set-content test_output.txt $lines}
So my script is pretty similar, so I am going to post what I ended up doing.
I had a bunch of servers all with the same script in the same location, and I needed to updated a path in all of the scripts.
i just replaced the entire line (line 3 in this script) and rewrote the script back out
my server names and "paths" to replace the old path were stored in an array (you could pull that from a DB if you wanted to automated it more:
$servers = #("Server1","Server2")
$Paths = #("\\NASSHARE\SERVER1\Databackups","\\NASSHARE\SERVER2\Databackups")
$a = 0
foreach ($x in $servers)
{
$dest = "\\" + $x + "\e$\Powershell\Backup.ps1"
$newline = '$backupNASPath = "' + $Paths[$a] + '"'
$lines = #(Get-Content $dest)
$lines[3] = $newline
$lines > $dest
$a++
}
it works, and saved me a ton of time logging into each server and updating each path. ugh
Cheers