Powershell - Count number of carriage returns line feed in .txt file

Powershell - Count number of carriage returns line feed in .txt file - powershell

I have a large text file (output from SQL db) and I need to determine the row count. However, since the source SQL data itself contains carriage returns \r and line feeds \n (NEVER appearing together), the data for some rows spans multiple lines in the output .txt file. The Powershell I'm using below gives me the file line count which is greater than the actual SQL row count. So I need to modify the script to ignore the additional lines - one way of doing it might be just counting the number of times CRLF or \r\n occurs (TOGETHER) in the file and that should be the actual number of rows but I'm not sure how to do it.
Get-ChildItem "." |% {$n = $_; $c = 0; Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count }; "$n; $c"} > row_count.txt

I just learned myself that the Get-Content splits and streams each lines in a file by CR, CRLF, and LF sothat it can read data between operating systems interchangeably:
"1`r2`n3`r`n4" | Out-File .\Test.txt
(Get-Content .\Test.txt).Count
4
Reading the question again, I might have misunderstood your question.
In any case, if you want to split (count) on only a specific character combination:
CR
((Get-Content -Raw .\Test.txt).Trim() -Split '\r').Count
3
LF
((Get-Content -Raw .\Test.txt).Trim() -Split '\n').Count
3
CRLF
((Get-Content -Raw .\Test.txt).Trim() -Split '\r\n').Count # or: -Split [Environment]::NewLine
2
Note .Trim() method which removes the extra newline (white spaces) at the end of the file added by the Get-Content -Raw parameter.
Addendum
(Update based on the comment on the memory exception)
I am afraid that there is currently no other option then building your own StreamReader using the ReadBlock method and specifically split lines on a CRLF. I have opened a feature request for this issue: -NewLine Parameter to customize line separator for Get-Content
Get-Lines
A possible way to workaround the memory exception errors:
function Get-Lines {
[CmdletBinding()][OutputType([string])] param(
[Parameter(ValueFromPipeLine = $True)][string] $Filename,
[String] $NewLine = [Environment]::NewLine
)
Begin {
[Char[]] $Buffer = new-object Char[] 10
$Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList (Get-Item($Filename))
$Rest = '' # Note that a multiple character newline (as CRLF) could be split at the end of the buffer
}
Process {
While ($True) {
$Length = $Reader.ReadBlock($Buffer, 0, $Buffer.Length)
if (!$length) { Break }
$Split = ($Rest + [string]::new($Buffer[0..($Length - 1)])) -Split $NewLine
If ($Split.Count -gt 1) { $Split[0..($Split.Count - 2)] }
$Rest = $Split[-1]
}
}
End {
$Rest
}
}
Usage
To prevent the memory exceptions it is important that you do not assign the results to a variable or use brackets as this will stall the PowerShell PowerShell pipeline and store everything in memory.
$Count = 0
Get-Lines .\Test.txt | ForEach-Object { $Count++ }
$Count

The System.IO.StreamReader.ReadBlock solution that reads the file in fixed-size blocks and performs custom splitting into lines in iRon's helpful answer is the best choice, because it both avoids out-of-memory problems and performs well (by PowerShell standards).
If performance in terms of execution speed isn't paramount, you can take advantage of
Get-Content's -Delimiter parameter, which accepts a custom string to split the file content by:
# Outputs the count of CRLF-terminated lines.
(Get-Content largeFile.txt -Delimiter "`r`n" | Measure-Object).Count
Note that -Delimiter employs optional-terminator logic when splitting: that is, if the file content ends in the given delimiter string, no extra, empty element is reported at the end.
This is consistent with the default behavior, where a trailing newline in a file is considered an optional terminator that does not resulting in an additional, empty line getting reported.
However, in case a -Delimiter string that is unrelated to newline characters is used, a trailing newline is considered a final "line" (element).
A quick example:
# Create a test file without a trailing newline.
# Note the CR-only newline (`r) after 'line 1'
"line1`rrest of line1`r`nline2" | Set-Content -NoNewLine test1.txt
# Create another test file with the same content plus
# a trailing CRLF newline.
"line1`rrest of line1`r`nline2`r`n" | Set-Content -NoNewLine test2.txt
'test1.txt', 'test2.txt' | ForEach-Object {
"--- $_"
# Split by CRLF only and enclose the resulting lines in [...]
Get-Content $_ -Delimiter "`r`n" |
ForEach-Object { "[{0}]" -f ($_ -replace "`r", '`r') }
}
This yields:
--- test1.txt
[line1`rrest of line1]
[line2]
--- test2.txt
[line1`rrest of line1]
[line2]
As you can see, the two test files were processed identically, because the trailing CRLF newline was considered an optional terminator for the last line.

Related

Re-assembling split file names with Powershell

I'm having trouble re-assembling certain filenames (and discarding the rest) from a text file. The filenames are split up (usually on three lines) and there is always a blank line after each filename. I only want to keep filenames that begin with OPEN or FOUR. An example is:
OPEN.492820.EXTR
A.STANDARD.38383
333
FOUR.383838.282.
STAND.848484.NOR
MAL.3939
CLOSE.3480384.ST
ANDARD.39393939.
838383
The output I'd like would be:
OPEN.492820.EXTRA.STANDARD.38383333
FOUR.383838.282.STAND.848484.NORMAL.3939
Thanks for any suggestions!

The following worked for me, you can give it a try.
See https://regex101.com/r/JuzXOb/1 for the Regex explanation.
$source = 'fullpath/to/inputfile.txt'
$destination = 'fullpath/to/resultfile.txt'
[regex]::Matches(
(Get-Content $source -Raw),
'(?msi)^(OPEN|FOUR)(.*?|\s*?)+([\r\n]$|\z)'
).Value.ForEach({ -join($_ -split '\r?\n').ForEach('Trim') }) |
Out-File $destination
For testing:
$txt = #'
OPEN.492820.EXTR
A.STANDARD.38383
333
FOUR.383838.282.
STAND.848484.NOR
MAL.3939
CLOSE.3480384.ST
ANDARD.39393939.
838383
OPEN.492820.EXTR
A.EXAMPLE123
FOUR.383838.282.
STAND.848484.123
ZXC
'#
[regex]::Matches(
$txt,
'(?msi)^(OPEN|FOUR)(.*?|\s*?)+([\r\n]$|\z)'
).Value.ForEach({ -join($_ -split '\r?\n').ForEach('Trim') })
Output:
OPEN.492820.EXTRA.STANDARD.38383333
FOUR.383838.282.STAND.848484.NORMAL.3939
OPEN.492820.EXTRA.EXAMPLE123
FOUR.383838.282.STAND.848484.123ZXC

Read the file one line at a time and keep concatenating them until you encounter a blank line, at which point you output the concatenated string and repeat until you reach the end of the file:
# this variable will keep track of the partial file names
$fileName = ''
# use a switch to read the file and process each line
switch -Regex -File ('path\to\file.txt') {
# when we see a blank line...
'^\s*$' {
# ... we output it if it starts with the right word
if($s -cmatch '^(OPEN|FOUR)'){ $fileName }
# and then start over
$fileName = ''
}
default {
# must be a non-blank line, concatenate it to the previous ones
$s += $_
}
}
# remember to check and output the last one
if($s -cmatch '^(OPEN|FOUR)'){
$fileName
}

Powershell - Read, Alter Line, Write new line over old in Text Document

I am reading in line by line of a text file. If I see a specific string, I locate the first and last of a specific character, use two substrings to create a smaller string, then replace the line in the text file.
The difficult part: I have the line of text stored in a variable, but cannot figure out how to write this new line over the old line in the text document.
Excuse the crude code - I have been testing things and only started playing with PowerShell a few hours ago.
foreach($line in [System.IO.File]::ReadLines("C:\BatchPractice\test.txt"))
{
Write-Output $line
if ($line.Contains("dsa")) {
Write-Output "TRUEEEEE"
}
$positionF = $line.IndexOf("\")+1
$positionL = $line.LastIndexOf("\")+1
$lengthT = $line.Length
Write-Output ($positionF)
Write-Output $positionL
Write-Output $lengthT
if($line.Contains("\")){
Write-Output "Start"
$combine = $line.Substring(0,$positionF-1) + $line.Substring($postionL,($lengthT-$positionL))
Write-Output $combine
$line1 = $line.Substring(0,$positionF-1)
$line2 = $line.Substring($positionL,($lengthT-$positionL))
$combined = $line1 + $line2
Write-Output $combined
Write-Output "Close"
}
}```

You can save the file as arrays in Get-Content and Set-Content:
$file=(Get-Content "C:\BatchPractice\test.txt")
Then you can edit it like arrays:
$file[LINE_NUMBER]="New line"
Where LINE_NUMBER is the line number starting from 0.
And then overwrite to file:
$file|Set-Content "C:\BatchPractice\test.txt"
You can implement this in code. Create a variable $i=0 and increment it at the end of loop. Here $i will be the line number at each iteration.
HTH

Based on your code it seems you want to take any line that contains 'dsa' and remove the contents after the first backslash up until the last backslash. If that's the case I'd recommend simplifying your code with regex. First I made a sample file since none was provided.
$tempfile = New-TemporaryFile
#'
abc\def\ghi\jkl
abc\dsa\ghi\jkl
zyx\vut\dsa\srq
zyx\vut\srq\pon
'# | Set-Content $tempfile -Encoding UTF8
Now we will read in all lines (unless this is a massive file)
$text = Get-Content $tempfile -Encoding UTF8
Next we'll make a regex object with the pattern we want to replace. The double backslash is to escape the backslash since it has meaning to regex.
$regex = [regex]'(?<=.+\\).+\\'
Now we will loop over every line, if it has dsa in it we will run the replace against it, otherwise we will output the line.
$text | ForEach-Object {
if($_.contains('dsa'))
{
$regex.Replace($_,'')
}
else
{
$_
}
} -OutVariable newtext
You'll see the output on the screen but it's also capture in $newtext variable. I recommend ensuring it is the output you are after prior to writing.
abc\def\ghi\jkl
abc\jkl
zyx\srq
zyx\vut\srq\pon
Once confirmed, simply write it back to the file.
$newtext | Set-Content $tempfile -Encoding UTF8
You can obviously combine the steps as well.
$text | ForEach-Object {
if($_.contains('dsa'))
{
$regex.Replace($_,'')
}
else
{
$_
}
} | Set-Content $tempfile -Encoding UTF8

Multiline data: Removing LF (but not CRLF) from CSV using Powershell

I have some CSV data I need to clean up by removing inline linefeeds and special characters like typographic quotes. I feel like I could get this working with Python or Unix utils, but I'm stuck on a pretty vanilla Windows 2012 box, so I'm giving PowerShell v5 a shot despite my lack of experience with it.
Here's what I'm looking to achieve:
$InputFile:
"INCIDENT_NUMBER","FIRST_NAME","LAST_NAME","DESCRIPTION"{CRLF}
"00020306","John","Davis","Employee was not dressed appropriately."{CRLF}
"00020307","Brad","Miller","Employee told customer, ""Go shop somewhere else!"""{CRLF}
"00020308","Ted","Jones","Employee told supervisor, “That’s not my job”"{CRLF}
"00020309","Bob","Meyers","Employee did the following:{LF}
• Showed up late{LF}
• Did not complete assignments{LF}
• Left work early"{CRLF}
"00020310","John","Davis","Employee was not dressed appropriately."{CRLF}
$OutputFile:
"INCIDENT_NUMBER","FIRST_NAME","LAST_NAME","DESCRIPTION"{CRLF}
"00020307","Brad","Miller","Employee told customer, ""Go shop somewhere else!"""{CRLF}
"00020308","Ted","Jones","Employee told supervisor, ""That's not my job"""{CRLF}
"00020309","Bob","Meyers","Employee did the following: * Showed up late * Did not complete assignments * Left work early"{CRLF}
"00020310","John","Davis","Employee was not dressed appropriately."{CRLF}
The following code works:
(Get-Content $InputFile -Raw) `
-replace '(?<!\x0d)\x0a',' ' `
-replace "[‘’´]","'" `
-replace '[“”]','""' `
-replace "\xa0"," " `
-replace '[•·]','*' | Set-Content $OutputFile -Encoding ASCII
However, the actual data I'm dealing with is a 4GB file with over a million lines. Get-Content -Raw runs out of memory. I tried Get-Content -ReadCount 10000, but that removes all linefeeds, presumably because it reads line-wise.
More Googling brought me to Import-Csv which I got from here:
Import-Csv $InputFile | ForEach {
$_.notes = $_.notes -replace '(?<!\x0d)\x0a',' '
$_
} | Export-Csv $OutputFile -NoTypeInformation -Encoding ASCII
but I don't appear to have a notes property on my objects:
Exception setting "notes": "The property 'notes' cannot be found on this object. Verify that the property exists and can be set."
At C:\convert.ps1:53 char:5
+ $_.notes= $_.notes -replace '(?<!\x0d)\x0a',' '
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], SetValueInvocationException
+ FullyQualifiedErrorId : ExceptionWhenSetting
I found another example using the Value property, but I got the same error.
I tried running Get-Member on each object and it looks like it's assigning properties based on the header from the file, like I may be able to get it with $_.DESCRIPTION, but I don't know enough PowerShell to run the replacements on all of the properties :(
Please help? Thanks!
Update:
I ended up giving up on PS and coding this in AutoIT. It's not great, and it will be more difficult to maintain, especially since there hasn't been a new release in 2.5 years. But it works, and it crunches the prod file in 4 minutes.
Unfortunately, I couldn't key on the LF easily either, so I ended up going with the logic to create new lines based on ^"[^",] (Line starts with a quote and the second character is not a quote or comma).
Here's the AutoIT code:
#include <FileConstants.au3>
If $CmdLine[0] <> 2 Then
ConsoleWriteError("Error in parameters" & #CRLF)
Exit 1
EndIf
Local Const $sInputFilePath = $CmdLine[1]
Local Const $sOutputFilePath = $CmdLine[2]
ConsoleWrite("Input file: " & $sInputFilePath & #CRLF)
ConsoleWrite("Output file: " & $sOutputFilePath & #CRLF)
ConsoleWrite("***** WARNING *****" & #CRLF)
ConsoleWrite($sOutputFilePath & " is being OVERWRITTEN!" & #CRLF & #CRLF)
Local $bFirstLine = True
Local $hInputFile = FileOpen($sInputFilePath, $FO_ANSI)
If $hInputFile = -1 Then
ConsoleWriteError("An error occurred when reading the file.")
Exit 1
EndIf
Local $hOutputFile = FileOpen($sOutputFilePath, $FO_OVERWRITE + $FO_ANSI)
If $hOutputFile = -1 Then
ConsoleWriteError"An error occurred when opening the output file.")
Exit 1
EndIf
ConsoleWrite("Processing..." &#CRLF)
While True
$sLine = FileReadLine($hInputFile)
If #error = -1 Then ExitLoop
;Replace typographic single quotes and backtick with apostrophe
$sLine = StringRegExpReplace($sLine, "[‘’´]","'")
;Replace typographic double quotes with normal quote (doubled for in-field CSV)
$sLine = StringRegExpReplace($sLine, '[“”]','""')
;Replace bullet and middot with asterisk
$sLine = StringRegExpReplace($sLine, '[•·]','*')
;Replace non-breaking space (0xA0) and delete (0x7F) with space
$sLine = StringRegExpReplace($sLine, "[\xa0\x7f]"," ")
If $bFirstLine = False Then
If StringRegExp($sLine,'^"[^",]') Then
$sLine = #CRLF & $sLine
Else
$sLine = " " & $sLine
EndIf
Else
$bFirstLine = False
EndIf
FileWrite($hOutputFile, $sLine)
WEnd
ConsoleWrite("Done!" &#CRLF)
FileClose($hInputFile)
FileClose($hOutputFile)

The first answer may be better than this, as I'm not sure if PS needs to load everything into memory this way or not (though I think it does), but going off what you started above, I was thinking along this line...
# Import CSV into a variable
$InputFile = Import-Csv $InputFilePath
# Gets all field names, stores in $Fields
$InputFile | Get-Member -MemberType NoteProperty |
Select-Object Name | Set-Variable Fields
# Updates each field entry
$InputFile | ForEach-Object {
$thisLine = $_
$Fields | ForEach-Object {
($thisLine).($_.Name) = ($thisLine).($_.Name) `
-replace '(?<!\x0d)\x0a',' ' `
-replace "[‘’´]","'" `
-replace '[“”]','""' `
-replace "\xa0"," " `
-replace '[•·]','*'
}
$thisLine | Export-Csv $OutputFile -NoTypeInformation -Encoding ASCII -Append
}

Here's another "line-by-line" attempt, somewhat akin to mklement0's answer. It assumes that no "row-continuation" line will begin with ". Hopefully it performs much better!
# Clear contents of file (Not sure if you need/want this...)
if (Test-Path -type leaf $OutputFile) { Clear-Content $OutputFile }
# Flag for first entry, since no data manipulation needed there
$firstEntry = $true
foreach($line in [System.IO.File]::ReadLines($InputFile)) {
if ($firstEntry) {
Add-Content -Path $OutputFile -Value $line -NoNewline
$firstEntry = $false
}
else {
if ($line[0] -eq '"') { Add-Content -Path $OutputFile "`r`n" -NoNewline}
else { Add-Content -Path $OutputFile " " -NoNewline}
$sanitizedLine = $line -replace '(?<!\x0d)\x0a',' ' `
-replace "[‘’´]","'" `
-replace '[“”]','""' `
-replace "\xa0"," " `
-replace '[•·]','*'
Add-Content -Path $OutputFile -Value $sanitizedLine -NoNewline
}
}
The technique is based on this other answer and its comments: https://stackoverflow.com/a/47146987/7649168
(Also thanks to mklement0 for explaining the performance issues of my previous answer.)

Note:
See my other answer for a robust solution.
The answer below may still be of interest for a general line-by-line processing solution that performs well, although it invariably treats LF-only instance as line separators too (it has been updated to use the same regex to distinguish between a line that is start of a row and one that is a row's continuation that you use in the AutoIt solution you've added to the question).
Given the size of your file, I suggest sticking with plain-text processing for performance reasons:
The switch statement enables fast line-by-line processing; it recognizes both CRLF and LF as newlines, as PowerShell generally does. Note, however, given that each line returned has its trailing newline stripped, you won't be able to tell whether the input line ended in CRLF of just LF.
Using a .NET type directly, System.IO.StreamWriter, bypasses the pipeline and enables fast writes to the output file.
For general PowerShell performance tips, see this answer.
$inputFile = 'in.csv'
$outputFile = 'out.csv'
# Create a stream writer for the output file.
# Default to BOM-less UTF-8, but you can pass a [System.Text.Encoding]
# instance as the second argument.
# Note: Pass a *full* path, because .NET's working dir. usually differs from PowerShell's
$outFileWriter = [System.IO.StreamWriter]::new("$PWD/$outputFile")
# Use a `switch` statement to read the input file line by line.
$outLine = ''
switch -File $inputFile -Regex {
'^"[^",]' { # (Start of) a new row.
if ($outLine) { # write previous, potentially synthesized line
$outFileWriter.WriteLine($outLine)
}
$outLine = $_ -replace "[‘’´]", "'" -replace '[“”]', '""' -replace '\u00a0', ' '
}
default { # Continuation of a row.
$outLine += ' ' + $_ -replace "[‘’´]", "'" -replace '[“”]', '""' -replace '\u00a0', ' ' `
-replace '[•·]', '*' -replace '\n'
}
}
# Write the last line.
$outFileWriter.WriteLine($outLine)
$outFileWriter.Close()
Note: The above assumes that no row continuation also matches regex pattern '^"[^",]', which is hopefully robust enough (you've deemed it to be, given that you based your AutoIt solution on it).
This simple distinction between the start of a row and continuations on subsequent lines obviates the need for lower-level file I/O in order to distinguish between CRLF and LF newlines, which my other answer does.

The following two approaches would work in principle, but are too slow with a large input file such as yours.
Object-oriented processing with Import-Csv / Export-Csv:
Use Import-Csv to parse the CSV into objects, modify the objects' DESCRIPTION property values, then reexport with Export-Csv. Since the row-internal LF-only newlines are inside double-quoted fields, they are recognized as being part of the same row.
While a robust and conceptually elegant approach, it is by far the slowest and also very memory-intensive - see GitHub issue #7603, which discusses the reasons, and GiHub feature request #11027 to improve the situation by outputting hashtables rather than custom objects ([pscustomobject]).
Plain-text processing with Get-Content / Set-Content:
Use Get-Content -Delimiter "`r`n" to split the text file into lines by CRLF only, not also LF, transform each line as needed and save it to the output file with Set-Content.
While you pay a performance penalty for the conceptual elegance of using the pipeline in general, which makes saving the results with Set-Content line by line somewhat slow, Get-Content is especially slow, because it decorates each output string (line) with additional properties about the originating file, which is costly. See the green-lighted, but not yet implemented GitHub feature request #7537 to improve performance (and memory use) by omitting this decoration.
Solution:
For performance reasons, direct use of .NET APIs is therefore required.
Note: If the PowerShell solution should still be too slow, consider creating a helper class via ad-hoc compilation of C# code using Add-Type; ultimately, of course, using only compiled code will perform best.
While there is no direct equivalent to Get-Content -Delimiter "`r`n", you can read text files in fixed-size blocks (arrays) of characters, , using the System.IO.StreamReader.ReadBlock() method (.NET Framework 4.5+ / .NET Core 1+), on which you can then perform the desired transformations, as shown below.
Note:
For best performance, choose a high $BUFSIZE value below to minimize the number of reads and processing iterations; obviously, the value must be chosen so that you don't run out of memory.
There's not even a need to parse the blocks read into CRLF newlines, because you can simply target the LF-only lines with a regex that is a modified version of the one from your original approach, '(?<!\r|^)\n' (see code comments below).
For brevity, error handling is omitted, but the .Close() calls to close the files should generally be placed in the finally block of a try / catch / finally statement.
# In- and output file paths.
# Note: Be sure to use *full* paths, because .NET's working dir. usually
# differs from PowerShell's.
$inFile = "$PWD/in.csv"
$outFile = "$PWD/out.csv"
# How many characters to read at once.
# This is a tradeoff between execution speed and memory use.
$BUFSIZE = 100MB
$buf = [char[]]::new($BUFSIZE)
$inStream = [IO.StreamReader]::new($inFile)
$outStream = [IO.StreamWriter]::new($outFile)
# Process the file in fixed-size blocks of characters.
while ($charsRead = $inStream.ReadBlock($buf, 0, $BUFSIZE)) {
# Convert the array of chars. to a string.
$block = [string]::new($buf, 0, $charsRead)
# Transform the block and write it to the output file.
$outStream.Write(
# Transform block-internal LF-only newlines to spaces and perform other
# subsitutions.
# Note: The |^ part inside the negative lookbehind is to deal with the
# case where the block starts with "`n" due to the block boundaries
# accidentally having split a CRLF sequence.
($block -replace '(?<!\r|^)\n', ' ' -replace "[‘’´]", "'" -replace '[“”]', '""' -replace '\u00a0', ' ' -replace '[•·]', '*')
)
}
$inStream.Close()
$outStream.Close()

Check if the last line in a file is empty in PowerShell

I want to add some content to a file in a new line.
But add-content appends a string to last line if there is no new line symbol at the end.
E.g. if I want to add AAA string and if I have a file file1.txt
my last line(last cursor position here)
the result will be
my last lineAAA
On the other hand, if I use file2.txt
my last line
(last cursor position here)
the command will result in
my last line
AAA
So I need to check if the last line is empty or not. If it's not empty I will just add `n symbol to the string.
But if I run the commands
$lastLine = get-content $filename -Tail 1
if($lastLine.Length -ne 0) { ... }
it will always return me a length of not empty string even if my last line contains no symbols.
How can I check if my last line is empty ?

You could opt to start adding newlines to the file and for the first line to add do
$file = 'D:\Test\Blah.txt'
$firstLine = 'AAA'
Add-Content -Path $file -Value ("{0}`r`n{1}" -f (Get-Content -Path $file -Raw).TrimEnd(), $firstLine)
After that first line, you can simply keep using Add-Content which always appends an newline (unless you tell it not to do that with switch -NoNewline).
Seeing your comment, you can test the length of the last line like this:
$file = 'D:\Test\Blah.txt'
$lastLine = ((Get-Content -Path $file -Raw) -split '\r?\n')[-1]
# $lastLine.Length --> 0
if($lastLine.Length -ne 0) { ... }
The -Raw switch tells Get-Content to read the file as a whole in a single string. Split this string into separate lines with -split '\r?\n' and you'll get an array, including the last empty line

When you use "Get-Content -Tail 1", it will always recover the last "non empty" line.
# -----------------
# Your method returns the same line even if the file contains an empty line at the end of the file
# -----------------
$lastEmptyLine = Get-Content "test_EmptyLine.txt" -Tail 1
$lastNonEmptyLine = Get-Content "test_NonEmptyLine.txt" -Tail 1
($lastEmptyLine -match '(?<=\r\n)\z')
#False
($lastNonEmptyLine -match '(?<=\r\n)\z')
#False
So if you want to keep the "Test" method (and not simply use Add-Content) you could use the following method :
# -----------------
# This method can tell you if a file finishes by an empty line or not
# -----------------
$contentWithEmptyLine = [IO.File]::ReadAllText("test_EmptyLine.txt")
$contentWithoutEmptyLine = [IO.File]::ReadAllText("test_NonEmptyLine.txt")
($contentWithEmptyLine -match '(?<=\r\n)\z')
#True
($contentWithoutEmptyLine -match '(?<=\r\n)\z')
#False
# -----------------
# You can also use Get-Content with Raw option
# -----------------
$rawContentWithEmptyLine = Get-Content "test_EmptyLine.txt" -Raw
$rawContentWithoutEmptyLine = Get-Content "test_NonEmptyLine.txt" -Raw
($rawContentWithEmptyLine -match '(?<=\r\n)\z')
#True
($rawContentWithoutEmptyLine -match '(?<=\r\n)\z')
#False
-Raw Ignores newline characters and returns the entire contents of a file in one string with the newlines preserved. By default, newline
characters in a file are used as delimiters to separate the input into
an array of strings. This parameter was introduced in PowerShell 3.0.
References :
Get-Content (Microsoft.PowerShell.Management)
Check CRLF at the end of every file
about_Comparison_Operators - PowerShell
Regular expression - Wikipedia

Find and Replace character only in certain column positions in each line

I'm trying to write a script to find all the periods in the first 11 characters or last 147 characters of each line (lines are fixed width of 193, so I'm attempting to ignore characters 12 through 45).
First I want a script that will just find all the periods from the first or last part of each line, but then if I find them I would like to replace all periods with 0's, but ignore periods on the 12th through 45th line and leaving those in place. It would scan all the *.dat files in the directory and create period free copies in a subfolder. So far I have:
$data = get-content "*.dat"
foreach($line in $data)
{
$line.substring(0,12)
$line.substring(46,147)
}
Then I run this with > Output.txt then do a select-string Output.txt -pattern ".". As you can see I'm a long ways from my goal as presently my program is mashing all the files together, and I haven't figured out how to do any replacement yet.

Get-Item *.dat |
ForEach-Object {
$file = $_
$_ |
Get-Content |
ForEach-Object {
$beginning = $_.Substring(0,12) -replace '\.','0'
$middle = $_.Substring(12,44)
$end = $_.Substring(45,147) -replace '\.','0'
'{0}{1}{2}' -f $beginning,$middle,$end
} |
Set-Content -Path (Join-Path $OutputDir $file.Name)
}

You can use the powershell -replace operator to replace the "." with "0". Then use substring as you do to build up the three portions of the string you're interested in to get the updated string. This will output an updated line for each line of your input.
$data = get-content "*.dat"
foreach($line in $data)
{
($line.SubString(0,12) -replace "\.","0") + $line.SubString(13,34) + ($line.substring(46,147) -replace "\.","0")
}
Note that the -replace operator performs a regular expression match and the "." is a special regular expression character so you need to escape it with a "\".

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Powershell - Count number of carriage returns line feed in .txt file - powershell

Related

Re-assembling split file names with Powershell

Powershell - Read, Alter Line, Write new line over old in Text Document

Multiline data: Removing LF (but not CRLF) from CSV using Powershell

Check if the last line in a file is empty in PowerShell

Find and Replace character only in certain column positions in each line

Categories

Resources