I am trying to capture the output of running an external command as a string, but the result returned by Powershell is always an array of strings. It is my understanding that Powershell splits the output on newlines (Windows or Unix line endings) and stores the result in an array, but I would like these newlines to be preserved since this is for an svn pre-commit hook script to detect if a committed file has mixed line endings.
$output = svnlook cat -t $transId -r $repos $filename
At this point line endings have been stripped, and $output is an array of strings, one per line of output.
Redirecting to a file does not seem to work since this normalizes the line endings. How do you tell Powershell to not split a command's output?
Thanks!
Pipe the command output into the Out-String cmdlet:
$output = svnlook cat -t $transId -r $repos $filename | Out-String
Just wrap the "inner" command in parentheses, and use the -join operator with "`n" on the right.
# Preserve line breaks
$result = (ipconfig) -join "`n";
# Remove line breaks
$result = (ipconfig) -join '';
If you need it as a single string, you can just cast the result to [string]
[string]$output = svnlook cat -t $transId -r $repos $filename
Related
So for this script I'm having an issue if I run the script multiple times it overwrites the second occurrence of "start-sleep -s 10" with the replaced text "start-sleep -s 30" that I need to remain the same ie. "start-sleep -s 10". When I tested this it works great when you run it once, but what if the script is ran multiple times? During testing, when the script is ran more than once, the second instance is then changed to "start-sleep -s 30" when I need it to remain "start-sleep -s 10" every time the script is ran as well as not altering the rest of the file. Is there a way in PS to prevent this? Possibly just have the script search lines 0-115 for example? because the second instance of start-sleep -s 10 is located at line 145. What I essentially need is for the script to find the first instance of "start-sleep -s 10" and replace with "start-sleep -s 30", leave the second instance of "start-sleep -s 10"alone every time I run this script. I posted an original question for this before and can add it at the end of this post.
$ScriptPath = "C:\ScriptsFolder\powershell.ps1"
$newContent = switch -Wildcard -File $ScriptPath {
# if this line contains `Start-Sleep -s 10`
'*Start-Sleep -s 10*' {
# if we previously matched this
if($skip) {
# output this line
$_
# and skip below logic
continue
}
# if we didn't match this before (if `$skip` doesn't exist)
# replace from this line `Start-Sleep -s 10` for `Start-Sleep -s 30`
$_.Replace('Start-Sleep -s 10', 'Start-Sleep -s 30')
# and set this variable to `$true` for future matches
$skip = $true
}
# if above condition was not met
Default {
# output this line as-is, don't change anything
$_
}
}
$newContent | Set-Content $ScriptPath
Original Question: I need to find and replace one occurrence of a string in a PowerShell script (script is about 250 lines long). There are two occurrences where the string "start-sleep -s 10" appears within the script. I need to edit and only change the first occurrence to "start-sleep -s 30 and keep the second occurrence "start-sleep -s 10". The issue I'm having is that there are a couple variants of the script I have to edit so my approach was locate the range of lines where the first occurrence is located, make the change, then save the script and keep everything else as is. New to PowerShell so not sure how to go about this. I keep seeing articles online of how to use Get-Content and Set-Content to find and replace text in a file but I need to change the first occurrence of "start-sleep -s 10" and keep the second one as is.
To replace only a certain number of matches you can use the Regex class Replace() method which allows you to specify the number of replacements
$scriptPath = '.\sleep_text.txt'
$scriptContent = Get-Content $scriptPath -Raw
$find = 'start-sleep -s 10'
$replacement = 'start-sleep -s 30'
$numberOfReplacements = 1
$regex = [regex]$find
$regex.Replace( $scriptContent, $replacement, $numberOfReplacements ) | Set-Content -Path $scriptPath
Not processing the same file twice
I agree with Theo's comment that if you do not want to process the same file twice then just save the file to a different location or to a different name that you can filter out during pre-processing. If you don't like this method another method you could use is to provide your script a different way of knowing which files have been processed already.
Here is an example of one way. Here we append a comment to the bottom of the script during processing. The script looks for this comment in an if statement and if found just displays a warning on the screen that the file has already been processed rather then processing again.
$scriptPath = '.\sleep_text.txt'
$scriptContent = Get-Content $scriptPath -Raw
if ($scriptContent -notmatch '# processed') {
$find = 'start-sleep -s 10'
$replacement = 'start-sleep -s 30'
$numberOfReplacements = 1
$regex = [regex]$find
($regex.Replace( $scriptContent, $replacement, $numberOfReplacements )) + "`n# processed" | Set-Content -Path $scriptPath
}
else {
Write-Warning "$scriptPath already processed"
}
WARNING: .\sleep_text.txt already processed
I am using a script in git bash, which performs few curl calls to HTTP endpoints expecting and producing protobuf.
The curl-output is piped to a custom proto2json.exe file and finally the result is saved to a JSON file:
#!/bin/bash
SCRIPT_DIR=$(dirname $0)
JSON2PROTO="$SCRIPT_DIR/json2proto.exe"
PROTO2JSON="$SCRIPT_DIR/proto2json.exe"
echo '{"key1":"value1","version":3}' | $JSON2PROTO -v 3 > request.dat
curl --insecure --data-binary #request.dat --output - https://localhost/protobuf | $PROTO2JSON -v 3 > response.json
The script works well and now I am trying to port it to Powershell:
$SCRIPT_DIR = Split-Path -parent $PSCommandPath
$JSON2PROTO = "$SCRIPT_DIR/json2proto.exe"
$PROTO2JSON = "$SCRIPT_DIR/proto2json.exe"
#{
key1 = value1;
version = 3;
} | ConvertTo-Json | &$JSON2PROTO -v 3 > request.dat
Unfortunately, when I compare the generated binary files in "git bash" and in Powershell, then I see that the latter file has additionaly zero bytes entered.
Is the GitHub issue #1908 related to my issue?
It looks like you're ultimately after this:
$SCRIPT_DIR = Split-Path -parent $PSCommandPath
$JSON2PROTO = "$SCRIPT_DIR/json2proto.exe"
$PROTO2JSON = "$SCRIPT_DIR/proto2json.exe"
# Make sure that the output from your $JSON2PROTO executable is correctly decoded
# as UTF-8.
# You may want to restore the original encoding later.
[Console]::OutputEncoding = [System.Text.Utf8Encoding]::new()
# Capture the output lines from calling the $JSON2PROTO executable.
# Note: PowerShell captures a *single* output line as-is, and
# *multiple* ones *as an array*.
[array] $output =
#{
key1 = value1;
version = 3;
} | ConvertTo-Json | & $JSON2PROTO -v 3
# Filter out empty lines to extract the one line of interest.
[string] $singleOutputLineOfInterest = $output -ne ''
# Write a BOM-less UTF-8 file with the given text as-is,
# without appending a newline.
[System.IO.File]::WriteAllText(
"$PWD/request.dat",
$singleOutputLineOfInterest
)
As for what you tried:
In PowerShell, > is an effective alias of the Out-File cmdlet, whose default output character encoding in Windows PowerShell is "Unicode" (UTF-16LE) - which is what you saw - and, in PowerShell (Core) 7+, BOM-less UTF8. To control the character encoding, call Out-File or, for text input, Set-Content with the -Encoding parameter.
Note that you may also have to ensure that an external program's output is first properly decoded, which happens based on the encoding stored in [Console]::OutputEncoding - see this answer for more information.
Note that you can't avoid these decoding + re-encoding steps in PowerShell as of v7.2.4, because the PowerShell pipeline currently cannot serve as a conduit for raw bytes, as discussed in this answer, which also links to the GitHub issue you mention.
Finally, note that both Out-File and Set-Content by default append a trailing, platform-native newline to the output file. While -NoNewLine suppresses that, it also suppresses newlines between multiple input objects, so you may have to use the -join operator to manually join the inputs with newlines in the desired format, e.g. (1, 2) -join "`n" | Set-Content -NoNewLine out.txt
If, in Windows PowerShell, you want to create UTF-8 files without a BOM, you can't use a file-writing cmdlet and must instead use .NET APIs directly (PowerShell (Core) 7+, by contrast, produces BOM-less UTF-8 files by default, consistently). .NET APIs do and always have created BOM-less UTF-8 files by default; e.g.:
[System.IO.File]::WriteAllLines() writes the elements of an array as lines to an output file, with each line terminated with a platform-native newline, i.e. CRLF (0xD 0xA) on Windows, and LF (0xA) on Unix-like platforms.
[System.IO.File]::WriteAllText() writes a single (potentially multi-line) string as-is to an output file.
Important: Always pass full paths to file-related .NET APIs, because PowerShell's current location (directory) usually differs from .NET's.
I have four text files in the following directory that have varying EOL characters:
C:\Sandbox 1.txt, 2.txt, 3.txt, 4.txt
I would like to write a powershell script that will loop through all files in the directory and find the EOL characters that are being used for each file and print them into a new file named EOL.txt
Sample contents of EOL.txt:
1.txt UNIX(LF)
2.txt WINDOWS(CRLF)
3.txt WINDOWS(CRLF)
4.txt UNIX(LF)
I know to loop through files I will need something like the following, but I'm not sure how to read the file EOL:
Get-ChildItem "C:\Sandbox" -Filter *.txt |
Foreach-Object {
}
OR
Get-Content "C:\Sandbox\*" -EOL | Out-File -FilePath "C:\Sandbox\EOL.txt"
##note that EOL is not a valid Get-Content command
Try the following:
Get-ChildItem C:\Sandbox\*.txt -Exclude EOL.txt |
Get-Content -Raw |
ForEach-Object {
$newlines = [regex]::Matches($_, '\r?\n').Value | Select-Object -Unique
$newLineDescr =
switch ($newlines.Count) {
0 { 'N/A' }
2 { 'MIXED' }
default { ('UNIX(LF)', 'WINDOWS(CRLF)')[$newlines -eq "`r`n"] }
}
# Construct and output a custom object for the file at hand.
[pscustomobject] #{
Path = $_.PSChildName
NewlineFormat = $newLineDescr
}
} # | Out-File ... to save to a file - see comments below.
The above outputs something like:
FileName NewlineFormat
-------- -------------
1.txt UNIX(LF)
2.txt WINDOWS(CRLF)
3.txt N/A
4.txt MIXED
N/A means that no newlines are present, MIXED means that both CRLF and LF newlines are present.
You can save the output:
directly in the for-display format shown above by appending a > redirection or piping (|) to Out-File, as in your question.
alternatively, using a structured text format better suited to programmatic processing, such CSV; e.g.:
Export-Csv -NoTypeInformation -Encoding utf8 C:\Sandbox\EOL.txt
Note:
Short of reading the raw bytes of a text file one by one or in batches, the only way to analyze the newline format is to read the file in full and search for newline sequences. Get-Content -Raw reads a given file in full.
[regex]::Matches($_, '\r?\n').Value extracts all newline sequences - whether CRLF or LF - from the file's content, and Select-Object -Unique reduces them to the set of distinct sequences.
('UNIX(LF)', 'WINDOWS(CRLF)')[$newlines -eq "`r`n"] is a convenient, but somewhat obscure emulation of the following ternary conditional:
$newlines -eq "`r`n" ? 'WINDOWS(CRLF)' : 'UNIX(LF)', which could be used in PowerShell (Core) 7+ as-is, but, unfortunately isn't supported in Windows PowerShell.
The technique relies on a [bool] value getting coerced to an [int] value when used as an array index ($true -> 1, $false -> 0), thereby selecting the appropriate element from the input array.
If you don't mind the verbosity, you can use a regular if statement as an expression (i.e., you can assign its output directly to a variable: $foo = if ...), which works in both PowerShell editions:
if ($newlines -eq "`r`n") { 'WINDOWS(CRLF)' } else { 'UNIX(LF)' }
Simpler alternative via WSL, if installed:
WSL comes with the file utility, which analyzes the content of files and reports summary information, including newline formats.
While you get no control over the output format, which invariably includes additional information, such as the file's character encoding, the command is much simpler:
Set-Location C:\Sandbox
wsl file *.txt
Caveats:
This approach is fundamentally limited to files on local drives.
If changing to the target dir. is not an option, relative paths would need their \ instances translated to /, and full paths would need drive specs. such as C: translated to /mnt/c (lowercase!).
Interpreting the output:
If the term line terminators (referring to newlines) is not mentioned in the output (for text files), Unix (LF) newlines only are implied.
Windows (CRLF) newlines only are implied if you see with CRLF line terminators
In case of a mix of LF and CRLF, you'll see with CRLF, LF line terminators
In the absence of newlines you'll see with no line terminators
I would like to include the result of a string concatenation command as part of a call using the pdftk command line tool. Here's what I tried:
$items_to_merge = '"' + $($(get-childitem *.pdf).name -join '" "') + '"'
echo $items_to_merge
pdftk $items_to_merge cat output test.pdf
The individual pdf file names themselves include spaces, so that's why I'm wrapping every file name with double quotes.
However it turns out that when I run this command I get pdftk errors as follows:
Error: Unable to find file
I'm a bit surprised because enumerating the files by hand and wrapping with double quotes has always worked. Why does it fail when I do it programmatically?
Going simpler works ok in Windows Sandbox, with an array of strings with or without extra double quotes. I printed a webpage to pdf to make a sample pdf.
$items_to_merge = (get-childitem *.pdf).name
$items_to_merge
pdftk $items_to_merge cat output test.pdf
Or with a wildcard:
pdftk *.pdf cat output test.pdf
In this kind of situation invoke-expression is your friend:
$items_to_merge = '"' + $($(get-childitem *.pdf).name -join '" "') + '"'
echo $items_to_merge
invoke-expression "pdftk $items_to_merge cat output test.pdf"
Let PowerShell do the quoting of the file names. By using # instead of $ in front of the variable name, PowerShell expands the array into individual arguments (aka array splatting) which will be quoted if necessary.
$items_to_merge = (get-childitem *.pdf).name
pdftk #items_to_merge cat output test.pdf
As commenter js2010 pointed out, splatting is not even necessary. This also works:
$items_to_merge = (get-childitem *.pdf).name
pdftk $items_to_merge cat output test.pdf
Apparently PowerShell implicitly splats an array argument when passed to a native command.
Personally I still prefer explicit splatting as it consistently works with both native and PowerShell commands.
Need to replace \x0d\x0a with \x2c\x0d\x0a in a file.
I can do it relatively easy on Unix:
awk '(NR>1){gsub("\r$",",\r")}1' $file > "fixed_$file":
Need help with implementing this in PowerShell.
Thank you in advance.
Assuming that you're running this on Windows (where \r\n (CRLF) newlines are the default), the following command
is the equivalent of your awk command:
Get-Content $file | ForEach-Object {
if ($_.ReadCount -eq 1) { $_ } else { $_ -replace '$', ',' }
} | Set-Content "fixed_$file"
Caveat: The character encoding of the input file is not preserved, and
Set-Content uses a default, which you can override with -Encoding.
In Windows PowerShell, this default is the system's "ANSI" encoding, whereas in PowerShell Core it is BOM-less UTF-8.
Get-Content $file reads the input file line by line.
The ForEach-Object loop passes the 1st line ($_.ReadCount -eq 1) through as-is ($_), and appends , (which is what escape sequence \x2c in your awk command represents) to all others ($_ -replace '$', ',').
Note: $_ + ',' or "$_," are simpler alternatives for appending a comma; the regex-based -replace operator was used here to highlight the PowerShell feature that is similar to awk's gsub().
Set-Content then writes the resulting lines to the target file, terminating each with the platform-appropriate newline sequence, which on Windows is CRLF (\r\n).