Powershell5 Compact code by combining foreach, begin, process, and replace command - powershell

Can I get the same results with less code?
The code searches sample.bat for the strings AROUND LINE {1-9999} and LINE2 {1-9999} and replaces {1-9999} with the {line number} the code is on.
sample.bat:
AROUND LINE 262
LINE2 1964
Old code:
gc $env:temp\sample.bat | foreach -Begin {$lc = 1} -Process {
$_ -replace "AROUND LINE \d*", "AROUND LINE $lc";
$lc += 1
} | Out-File -Encoding Ascii $env:temp\results.bat
(gc $env:temp\results.bat) | foreach -Begin {$lc = 1} -Process {
$_ -replace "LINE2 \d*", "LINE2 $lc";
$lc += 1
} | Out-File -Encoding Ascii $env:temp\results.bat
Current code:
(gc $env:temp\sample.bat) | foreach -Begin {$lc = 1} -Process {
$_ -replace "AROUND LINE \d*", "AROUND LINE $lc";
$lc += 1
} | foreach -Begin {$lc = 1} -Process {
$_ -replace "LINE2 \d*", "LINE2 $lc";
} | Out-File -Encoding Ascii $env:temp\sample.bat
Expected results:
AROUND LINE 1
LINE2 2
Actual results:
AROUND LINE 1
LINE2 2

You can make this work with a single regex:
gc $env:temp\sample.bat | foreach -Begin {$lc = 1} -Process {
$_ -replace '(?<=AROUND LINE |LINE2 )\d+', $lc++
} | Set-Content -Encoding Ascii $env:temp\results.bat
Note that I'm using '...' (single quotes) rather than "..." (double quotes) to enclose the regex, which is preferable to rule out potential confusion arising from PowerShell performing string expansion (interpolation) first.
$lc++ returns the current $lc value and increments it by 1 afterwards, obviating the need for the $lc += 1 statement.
Also, I've replaced Out-File with Set-Content, as they're functionally the same for saving strings, but the latter is faster.
Finally, to match one or more digits, use \d+ rather than \d*.
A note on $_ -replace '(?<=AROUND LINE |LINE2 )\d+', $lc++:
Regex (?<=AROUND LINE |LINE2 )\d+ uses a look-behind assertion ((?<=...) to look for either (|) string AROUND LINE  or string LINE2 before one or more (+) digits (\d).
The look-behind assertion is by design not considered part of the match, so that the substring getting replaced is limited to the run of digits, i.e., the number only.
$lc++ is the replacement operand: it returns the current value of variable $lc and increments its value afterwards; note that even though $lc is a number ([int]), PowerShell automatically converts it to a string for the replacement.
Generally, though, you can simply chain -replace operations:
# ...
$_ -replace 'AROUND LINE \d+', "AROUND LINE $lc" -replace 'LINE2 \d+', "LINE2 $lc"
++$lc
# ...

Related

Powershell - Count number of carriage returns line feed in .txt file

I have a large text file (output from SQL db) and I need to determine the row count. However, since the source SQL data itself contains carriage returns \r and line feeds \n (NEVER appearing together), the data for some rows spans multiple lines in the output .txt file. The Powershell I'm using below gives me the file line count which is greater than the actual SQL row count. So I need to modify the script to ignore the additional lines - one way of doing it might be just counting the number of times CRLF or \r\n occurs (TOGETHER) in the file and that should be the actual number of rows but I'm not sure how to do it.
Get-ChildItem "." |% {$n = $_; $c = 0; Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count }; "$n; $c"} > row_count.txt
I just learned myself that the Get-Content splits and streams each lines in a file by CR, CRLF, and LF sothat it can read data between operating systems interchangeably:
"1`r2`n3`r`n4" | Out-File .\Test.txt
(Get-Content .\Test.txt).Count
4
Reading the question again, I might have misunderstood your question.
In any case, if you want to split (count) on only a specific character combination:
CR
((Get-Content -Raw .\Test.txt).Trim() -Split '\r').Count
3
LF
((Get-Content -Raw .\Test.txt).Trim() -Split '\n').Count
3
CRLF
((Get-Content -Raw .\Test.txt).Trim() -Split '\r\n').Count # or: -Split [Environment]::NewLine
2
Note .Trim() method which removes the extra newline (white spaces) at the end of the file added by the Get-Content -Raw parameter.
Addendum
(Update based on the comment on the memory exception)
I am afraid that there is currently no other option then building your own StreamReader using the ReadBlock method and specifically split lines on a CRLF. I have opened a feature request for this issue: -NewLine Parameter to customize line separator for Get-Content
Get-Lines
A possible way to workaround the memory exception errors:
function Get-Lines {
[CmdletBinding()][OutputType([string])] param(
[Parameter(ValueFromPipeLine = $True)][string] $Filename,
[String] $NewLine = [Environment]::NewLine
)
Begin {
[Char[]] $Buffer = new-object Char[] 10
$Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList (Get-Item($Filename))
$Rest = '' # Note that a multiple character newline (as CRLF) could be split at the end of the buffer
}
Process {
While ($True) {
$Length = $Reader.ReadBlock($Buffer, 0, $Buffer.Length)
if (!$length) { Break }
$Split = ($Rest + [string]::new($Buffer[0..($Length - 1)])) -Split $NewLine
If ($Split.Count -gt 1) { $Split[0..($Split.Count - 2)] }
$Rest = $Split[-1]
}
}
End {
$Rest
}
}
Usage
To prevent the memory exceptions it is important that you do not assign the results to a variable or use brackets as this will stall the PowerShell PowerShell pipeline and store everything in memory.
$Count = 0
Get-Lines .\Test.txt | ForEach-Object { $Count++ }
$Count
The System.IO.StreamReader.ReadBlock solution that reads the file in fixed-size blocks and performs custom splitting into lines in iRon's helpful answer is the best choice, because it both avoids out-of-memory problems and performs well (by PowerShell standards).
If performance in terms of execution speed isn't paramount, you can take advantage of
Get-Content's -Delimiter parameter, which accepts a custom string to split the file content by:
# Outputs the count of CRLF-terminated lines.
(Get-Content largeFile.txt -Delimiter "`r`n" | Measure-Object).Count
Note that -Delimiter employs optional-terminator logic when splitting: that is, if the file content ends in the given delimiter string, no extra, empty element is reported at the end.
This is consistent with the default behavior, where a trailing newline in a file is considered an optional terminator that does not resulting in an additional, empty line getting reported.
However, in case a -Delimiter string that is unrelated to newline characters is used, a trailing newline is considered a final "line" (element).
A quick example:
# Create a test file without a trailing newline.
# Note the CR-only newline (`r) after 'line 1'
"line1`rrest of line1`r`nline2" | Set-Content -NoNewLine test1.txt
# Create another test file with the same content plus
# a trailing CRLF newline.
"line1`rrest of line1`r`nline2`r`n" | Set-Content -NoNewLine test2.txt
'test1.txt', 'test2.txt' | ForEach-Object {
"--- $_"
# Split by CRLF only and enclose the resulting lines in [...]
Get-Content $_ -Delimiter "`r`n" |
ForEach-Object { "[{0}]" -f ($_ -replace "`r", '`r') }
}
This yields:
--- test1.txt
[line1`rrest of line1]
[line2]
--- test2.txt
[line1`rrest of line1]
[line2]
As you can see, the two test files were processed identically, because the trailing CRLF newline was considered an optional terminator for the last line.

Splitting in Powershell

I want to be able to split some text out of a txtfile:
For example:
Brackets#Release 1.11.6#Path-to-Brackets
Atom#v1.4#Path-to-Atom
I just want to have the "Release 1.11.6" part. I am doing a where-object starts with Brackets but I don't know the full syntax. Here is my code:
"Get-Content -Path thisfile.txt | Where-Object{$_ < IM STUCK HERE > !
You could do this:
((Get-Content thisfile.txt | Where-Object { $_ -match '^Brackets' }) -Split '#')[1]
This uses the -match operator to filter out any lines that don't start with Brackets (the ^ special regex character indicates that what follows must be at the beginning of the line). Then it uses the -Split operator to split those lines on # and then it uses the array index [1] to get the second element of the split (arrays start at 0).
Note that this will throw an error if the split on # doesn't return at least two elements and it assumes that the text you want is always the second of those elements.
$bracketsRelease = Get-Content -path thisfile.txt | foreach-object {
if ( $_ -match 'Brackets#(Release [^#]+)#' )
{
$Matches[1]
}
}
or
(select-string -Path file.txt -Pattern 'Brackets#(Release [^#]+)#').Matches[0].Groups[1].value

powershell replace command if line starts with a specific character

I have a text file that I would like to read and do some replacements using powershell only if the line starts with a specific character.
SAy i want to change all the dash (-) to an 'x' if and only if the line starts with a y.
I tried using the command
(Get-Content trial.log2) | Foreach-Object {$_ -replace "-", 'x'} | Set-Content trial.log2
However, it actually replaces all occurrences of the dash, not only for the line the starts with a y.
Can this be also done if I want to have multiple find replace and string manipulation using one get content command?
I have another string manipulation but only if it starts with an F
If line starts with an F, then get first 4 characters of the line, then append 'NEW' then get the next characters from character 20 to 30.
if line starts with a y, then do a replace of - with an X.
$F=(get-content $file) -like 'F*'
(Get-Content $file) | Foreach-Object {
$_ -replace "^F.+", -join("$F".Substring(0,4), "$NEW3",
} | Set-Content trial.log2
Get-Content trial.log2 | ForEach-Object {
if ( $_ -match '^y' ) {
$_ -replace '-', 'X'
}
else {
$_
}
} | Set-Content trial.log3
However, if i do this, texts are being written twice. I think there is something wrong with how I look for the line that starts with the F
Any help is appreciated. Thanks!
You can use a look-behind ((?<=pattern)) to assert that the preceding characters include a y following the start of the string:
(Get-Content trial.log2) | Foreach-Object {$_ -replace '(?<=^y.*)-','x'} | Set-Content trial.log2
How about something like:
Get-Content trial.log2 | ForEach-Object {
if ( $_ -match '^y' ) {
$_ -replace '-', 'x'
}
else {
$_
}
} | Out-File trial.log2.temp

-notmatch with ... (3 dots)

I have a strange problem with my PowerShell CSV tool. I have tried to write a small check that filters out certain names and characters. These names/characters are in a textfile like this:
XXX
nana
YYY
...
DDD
I do the check lie this:
$reader = [System.IO.File]::OpenText($fc_file.Text)
try {
for() {
$line = $reader.ReadLine()
if ($line -eq $null) { break }
# process the line
Import-Csv $tempfile -Delimiter $delimeter -Encoding $char |
where {$_.$fc_suchfeld -notmatch $line} |
Export-Csv $tempstorage -Delimiter $delimeter -Encoding $char -NoTypeInfo
It works great until the line with the 3 dots. At this point almost all lines are deleted. How can I solve this problem?
The -match operator does regular expression matches. . is a metacharacter in regular expressions, matching any character except newlines. Thus a regular expression ... matches any line with at least 3 characters. If you want to use the lines from $fc_file as literal string matches you need to escape them:
... | where {$_.$fc_suchfeld -notmatch [regex]::Escape($line)} | ...
or do a wildcard match:
... | where {$_.$fc_suchfeld -notlike "*$line*"} | ...

Problems with replacing newline

Iam trying to replace following string with PowerShell:
...
("
Intel(R) Network Connections 14.2.100.0
","
14.2.100.0
")
...
The code that I use is:
Get-Content $logfilepath |
Foreach-Object { $_ -replace '`r`n`r`n', 'xx'} |
Set-Content $logfilepath_new
But I have no success, can someone say me, where the error is?
First, you are using single quotes in the replace string -
'`r`n`r`n'
that means they are treated verbatim and not as newline characters, so you have to use -
"`r`n`r`n"
To replace, read the file as string and use the Replace() method
$content=[string] $template= [System.IO.File]::ReadAllText("test.txt")
$content.Replace("`r`n`r`n","xx")
Get-content returns an array of lines, so CRLF is essentially your delimiter. Two CRLF sequences back to back would be interpreted as the end of the currrent line, followed by a null line, so no line (object) should contain '`r`n`r`n'. A multi-line regex replace would probably be a better choice.
as alternate method using PS cmdlets:
Get-Content $logfilepath |
Foreach-Object -Begin { $content="" } -Process { $content += $_ ; $content += "xx" } -End { $content } |
Set-Content $logfilepath_new
I used the following code to replace somestring with newline:
$nl = [System.Environment]::NewLine
$content = $content.Replace( somestring, $nl )