How to preserve newlines when using variables in here-strings - powershell

When running the following code:
$txt = Get-Content file1.txt
$a = #"
-- file start --
$txt
-- file end --
"#
$a
All new lines are removed from the file's contents, but just running
$txt
prints out the file without stripping the new lines.
Any idea how to get it to work as desired using the here-string?
Thanks!

If you put an array in a string it will be expanded with $OFS (or a space if $OFS is $null) between the items. You can see the same effect with either
"$txt"
''+$txt
and a few others. You can set $OFS="`r`n" which would change the space with which they are joined to a line break.
You could also change the Get-Content at the start to either
$txt = Get-Content file1.txt | Out-String
$txt = [IO.File]::ReadAllText((Join-Path $pwd file1.txt))

Pipe $txt to Out-String inside a sub-expression.
$a = #"
-- file start --
$($txt | Out-String)
-- file end --
"#

Related

Powershell - Count number of carriage returns line feed in .txt file

I have a large text file (output from SQL db) and I need to determine the row count. However, since the source SQL data itself contains carriage returns \r and line feeds \n (NEVER appearing together), the data for some rows spans multiple lines in the output .txt file. The Powershell I'm using below gives me the file line count which is greater than the actual SQL row count. So I need to modify the script to ignore the additional lines - one way of doing it might be just counting the number of times CRLF or \r\n occurs (TOGETHER) in the file and that should be the actual number of rows but I'm not sure how to do it.
Get-ChildItem "." |% {$n = $_; $c = 0; Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count }; "$n; $c"} > row_count.txt
I just learned myself that the Get-Content splits and streams each lines in a file by CR, CRLF, and LF sothat it can read data between operating systems interchangeably:
"1`r2`n3`r`n4" | Out-File .\Test.txt
(Get-Content .\Test.txt).Count
4
Reading the question again, I might have misunderstood your question.
In any case, if you want to split (count) on only a specific character combination:
CR
((Get-Content -Raw .\Test.txt).Trim() -Split '\r').Count
3
LF
((Get-Content -Raw .\Test.txt).Trim() -Split '\n').Count
3
CRLF
((Get-Content -Raw .\Test.txt).Trim() -Split '\r\n').Count # or: -Split [Environment]::NewLine
2
Note .Trim() method which removes the extra newline (white spaces) at the end of the file added by the Get-Content -Raw parameter.
Addendum
(Update based on the comment on the memory exception)
I am afraid that there is currently no other option then building your own StreamReader using the ReadBlock method and specifically split lines on a CRLF. I have opened a feature request for this issue: -NewLine Parameter to customize line separator for Get-Content
Get-Lines
A possible way to workaround the memory exception errors:
function Get-Lines {
[CmdletBinding()][OutputType([string])] param(
[Parameter(ValueFromPipeLine = $True)][string] $Filename,
[String] $NewLine = [Environment]::NewLine
)
Begin {
[Char[]] $Buffer = new-object Char[] 10
$Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList (Get-Item($Filename))
$Rest = '' # Note that a multiple character newline (as CRLF) could be split at the end of the buffer
}
Process {
While ($True) {
$Length = $Reader.ReadBlock($Buffer, 0, $Buffer.Length)
if (!$length) { Break }
$Split = ($Rest + [string]::new($Buffer[0..($Length - 1)])) -Split $NewLine
If ($Split.Count -gt 1) { $Split[0..($Split.Count - 2)] }
$Rest = $Split[-1]
}
}
End {
$Rest
}
}
Usage
To prevent the memory exceptions it is important that you do not assign the results to a variable or use brackets as this will stall the PowerShell PowerShell pipeline and store everything in memory.
$Count = 0
Get-Lines .\Test.txt | ForEach-Object { $Count++ }
$Count
The System.IO.StreamReader.ReadBlock solution that reads the file in fixed-size blocks and performs custom splitting into lines in iRon's helpful answer is the best choice, because it both avoids out-of-memory problems and performs well (by PowerShell standards).
If performance in terms of execution speed isn't paramount, you can take advantage of
Get-Content's -Delimiter parameter, which accepts a custom string to split the file content by:
# Outputs the count of CRLF-terminated lines.
(Get-Content largeFile.txt -Delimiter "`r`n" | Measure-Object).Count
Note that -Delimiter employs optional-terminator logic when splitting: that is, if the file content ends in the given delimiter string, no extra, empty element is reported at the end.
This is consistent with the default behavior, where a trailing newline in a file is considered an optional terminator that does not resulting in an additional, empty line getting reported.
However, in case a -Delimiter string that is unrelated to newline characters is used, a trailing newline is considered a final "line" (element).
A quick example:
# Create a test file without a trailing newline.
# Note the CR-only newline (`r) after 'line 1'
"line1`rrest of line1`r`nline2" | Set-Content -NoNewLine test1.txt
# Create another test file with the same content plus
# a trailing CRLF newline.
"line1`rrest of line1`r`nline2`r`n" | Set-Content -NoNewLine test2.txt
'test1.txt', 'test2.txt' | ForEach-Object {
"--- $_"
# Split by CRLF only and enclose the resulting lines in [...]
Get-Content $_ -Delimiter "`r`n" |
ForEach-Object { "[{0}]" -f ($_ -replace "`r", '`r') }
}
This yields:
--- test1.txt
[line1`rrest of line1]
[line2]
--- test2.txt
[line1`rrest of line1]
[line2]
As you can see, the two test files were processed identically, because the trailing CRLF newline was considered an optional terminator for the last line.

ExpandString breaks EOL

Context
I have a multi line file named DEV.properties. It contains references to ENV variables
ACTIVEMQ_DB_USERNAME=${ACTIVEMQ_DB_USERNAME}
I am writing a ps script to replace this file with one populated with the relevant variables
Problem
Here is how I proceed
#first load variables from a file
Get-Content C:\somewhere\over\the\rainbow\.credentials | Foreach-Object{$var = $_.Split('=');New-Variable -Name $var[0] -Value $var[1]}
$template = Get-Content DEV.properties
$expanded = $ExecutionContext.InvokeCommand.ExpandString($template)
Substitution is successful but while $template is a multi line string, all CRLF seems to have disappeared from $expanded. How can I fix it? Is there a more direct approach than looping though all lines?
$expanded = $template | %{ $ExecutionContext.InvokeCommand.ExpandString($_) }

How to read a file line by line and create another new file with that content using powershell

I have a file 'abc.txt' that contains below lines.
c:myfilepath\filepath\filepath1\file1.csv
c:myfilepath\filepath\filepath1\file2.csv
c:myfilepath\filepath\filepath1\file2.csv
How to loop through the above file 'abc.txt' and read line by line and create another file called 'xyz.txt' that should contains like below. The file name in the path in 'xyz.txt' should be different, see below (ex. newfile_file1.txt)
c:mynewfile\newfilepath\newfilepath1\newfile_file1.txt (<-This is
corresponding to file1.csv)
c:mynewfile\newfilepath\newfilepath1\newfile_file2.txt
c:mynewfile\newfilepath\newfilepath1\newfile_file2.txt
I've tried using Get-Content to loop through the file but I just get nothing returned. I'm unclear as to where to put the syntax and how to completely construct it.
This should do it (edited to get file names and paths as requested, and dynamic so the paths in the abc-file are used).
$f = Get-Content C:\temp\abc.txt # this is the contents-file
foreach ($r in $f)
{
$r2 = (Split-Path $r).Replace("\", "\new") + '\newfile_' + [io.path]::GetFileNameWithoutExtension($r) + '.txt'
$r2 = $r2.replace(":\", ":\mynewfile\")
Get-Content $r | Out-File -filepath $r2
}
Assuming all of your file paths start with c:myfilepath\filepath\filepath1, then you can just replace the string then Out-File it.
$File1 = get-content E:\abc.txt
$File1 -replace ('c:myfilepath\\filepath\\filepath1\\', 'c:mynewfile\newfilepath\newfilepath1\newfile_') |
Out-File E:\xyz.txt
Note the double backslashes \\ which escape the regex.

Why are all newlines gone after PowerShell's Get-Content, Regex, and Set-Content?

I want to load a file template into a variable, modify data within the variable and output the modified template to a new location from the variable.
The issue is that PowerShell is removing newlines from my template.
The input file (template file) has Unix line endings which are also required for output since the recipient of the modified version is a Unix-based system.
I have the following code which results into a concatted one-liner:
[String] $replacement = "Foo Bar"
[String] $template = Get-Content -Path "$pwd\template.sh" -Encoding UTF8
$template = $template -replace '<REPLACE_ME>', $replacement
$template | Set-Content -Path "$pwd\script.sh" -Encoding UTF8
Having the template input:
#!/bin/sh
myvar="<REPLACE_ME>"
echo "my variable: $myvar"
exit 0
Resulted into:
#!/bin/sh myvar="Foo Bar" echo "my variable: $myvar" exit 0
It appears to me that somewhere LF where replaced by one simple whitespace. Finally at the end of the script there is an added CR LF which was not present in the template file.
How do I preserve the line endings and prevent adding further (CR LF) wrong line endings to the final script?
For the $replacement variable, you don't really need to specify the type [string], PowerShell will infer that from the assignment.
For the $template variable, [string] is actually wrong. By default, Get-Content will give you an array of strings (i.e. lines) instead of one string.
But in fact you don't even want to split the input into lines in the first place. When Set-Content or Out-File see an array as their input, they will join it with spaces.
Using -Raw makes Get-Content return the entire file as one string, this way also the line endings (like LF for Linux files) will stay the way they are.
$replacement = "Foo Bar"
$template = Get-Content -Path "$pwd\template.sh" -Encoding UTF8 -Raw
$template = $template -replace '<REPLACE_ME>', $replacement
Set-Content -Path "$pwd\script.sh" -Value $template -Encoding UTF8
PowerShell will save all UTF-8 files with a BOM. If you don't want that, you must use a different utility to write the file:
$UTF8_NO_BOM = New-Object System.Text.UTF8Encoding $False
$replacement = "Foo Bar"
$template = Get-Content -Path "$pwd\template.sh" -Encoding UTF8 -Raw
$template = $template -replace '<REPLACE_ME>', $replacement
[System.IO.File]::WriteAllText("$pwd\script.sh", $template, $UTF8_NO_BOM)
Notes:
PowerShell operators (like -replace) silently operate on arrays. $x -replace "search", "replacement" will perform a replace operation on every member of $x, be that a single string or an array of them.
Recommended reading: PowerShell Set-Content and Out-File what is the difference?
Use the -delimiter "`n" option instead of -raw. The -raw option reads/returns the entire content as a single string, although it preserves the new-line characters but it is useless if you need to manipulate the content e.g. skip Header/1st row or skip blank lines etc.
Get-Content - background info:
By default, the Get-Content cmdlet reads & returns content line-by-line, which means if you pipe a Set-Content or Add-Content to instantly write each-line (being read) into the output file - the newline characters are preserved and written as expected, e.g.:
Get-Content $inputFile | Set-Content $outputFilePath
However, if you store the entire content (read) into a $variable, you will receive a single string-array without any separator/delimiter (by default), which means you lose the new-line characters, however, when reading file (using Get-Content) you can use the -delimiter option to specify a newline character, e.g.:
Get-Content -Delimiter "`n" $fileToRead
HTH.
I think you need to use the -Raw switch with Get-Content in order to load the file as a single string:
[String] $replacement = "Foo Bar"
[String] $template = Get-Content -Path "$pwd\template.sh" -Encoding UTF8 -Raw
$template = $template -replace '<REPLACE_ME>', $replacement
To stop the Windows line ending being added to the end of the script, I think you need to use this .NET method for writing the file:
[io.file]::WriteAllText("$pwd\template.sh",$template)
By default PowerShell attempts to convert your input in to an array of strings for each line in the file. I think because of the Unix line endings its not doing this successfully but is subsequently removing the new line characters.
In PowerShell 3.0 we now have a new dynamic parameter, Raw. When
specified, Get-Content ignores newline characters and returns the
entire contents of a file in one string. Raw is a dynamic parameter,
it is available only in file system drives.
https://social.technet.microsoft.com/Forums/windowsserver/en-US/6026b31a-2a0e-4e0a-90b5-355387dce9ac/preventing-newline-with-outfile-or-addcontent?forum=winserverpowershell
I was using Get-Content-Tail, which doesn't allow you to specify -Raw at the same time, but I did have luck with Out-String. So, in your case:
$template = Out-String -InputObject $( Get-Content -Path "$pwd\template.sh" -Encoding UTF8 -Raw)
Or perhaps, if you care about tail:
$template = Out-String -InputObject $(Get-Content -Path "$pwd\template.sh" -tail 4)

Piping Replace to New File

$x = Get-Content($file)
if ($x -match("~")) {
$x -replace("~","~`n") | Out-File $file
}
This is the snippet of code I am using. I have debugged up until this point and the code isn't updating after I replace the character tilda ~ with itself and then create a new line. When I output it to the command window and comment out the | Out-File $file the code works fine. When I try to pipe the new result back into the original file the code doesn't "unwrap" the file.
The replacement works just fine. However, you're inserting just linefeed characters (LF, `n), not the combination of carriage-return and linefeed (CR-LF, `r`n) that Windows uses for encoding line breaks. Because of that you don't see line breaks when opening the file in Notepad. PowerShell accepts both LF and CR-LF as line break encoding, so you see correctly wrapped lines when you output the file there.
Change your code to this and you'll get the expected result:
(Get-Content $file) -replace '~', "~`r`n" | Set-Content $file
My mistake. I was calling reader method above.
ForEach ($file in $Path){
$Array = #()
$reader = new-object System.IO.StreamReader($file)
I needed an array to determine if the file needed to be unwrapped to begin with. If the contents took up 1 line it needed to be unwrapped. If not then it did not. I essentially used streamreader which is going to store each line as an element in an array. I forgot to close $reader before so we had a producer-consumer issue and thus Out-File could not override $file.
Fixed Snippet:
if($Array.length -eq 1){
$x = Get-Content($file)
if($x -match("~")){
$reader.close()
($x -replace("~","~`n")) | Out-File $file
}
}