I'm having trouble re-assembling certain filenames (and discarding the rest) from a text file. The filenames are split up (usually on three lines) and there is always a blank line after each filename. I only want to keep filenames that begin with OPEN or FOUR. An example is:
OPEN.492820.EXTR
A.STANDARD.38383
333
FOUR.383838.282.
STAND.848484.NOR
MAL.3939
CLOSE.3480384.ST
ANDARD.39393939.
838383
The output I'd like would be:
OPEN.492820.EXTRA.STANDARD.38383333
FOUR.383838.282.STAND.848484.NORMAL.3939
Thanks for any suggestions!
The following worked for me, you can give it a try.
See https://regex101.com/r/JuzXOb/1 for the Regex explanation.
$source = 'fullpath/to/inputfile.txt'
$destination = 'fullpath/to/resultfile.txt'
[regex]::Matches(
(Get-Content $source -Raw),
'(?msi)^(OPEN|FOUR)(.*?|\s*?)+([\r\n]$|\z)'
).Value.ForEach({ -join($_ -split '\r?\n').ForEach('Trim') }) |
Out-File $destination
For testing:
$txt = #'
OPEN.492820.EXTR
A.STANDARD.38383
333
FOUR.383838.282.
STAND.848484.NOR
MAL.3939
CLOSE.3480384.ST
ANDARD.39393939.
838383
OPEN.492820.EXTR
A.EXAMPLE123
FOUR.383838.282.
STAND.848484.123
ZXC
'#
[regex]::Matches(
$txt,
'(?msi)^(OPEN|FOUR)(.*?|\s*?)+([\r\n]$|\z)'
).Value.ForEach({ -join($_ -split '\r?\n').ForEach('Trim') })
Output:
OPEN.492820.EXTRA.STANDARD.38383333
FOUR.383838.282.STAND.848484.NORMAL.3939
OPEN.492820.EXTRA.EXAMPLE123
FOUR.383838.282.STAND.848484.123ZXC
Read the file one line at a time and keep concatenating them until you encounter a blank line, at which point you output the concatenated string and repeat until you reach the end of the file:
# this variable will keep track of the partial file names
$fileName = ''
# use a switch to read the file and process each line
switch -Regex -File ('path\to\file.txt') {
# when we see a blank line...
'^\s*$' {
# ... we output it if it starts with the right word
if($s -cmatch '^(OPEN|FOUR)'){ $fileName }
# and then start over
$fileName = ''
}
default {
# must be a non-blank line, concatenate it to the previous ones
$s += $_
}
}
# remember to check and output the last one
if($s -cmatch '^(OPEN|FOUR)'){
$fileName
}
Related
Every night I got a text file that needs to be edited manually.
The file contains approximately 250 rows. Three example of a rows:
112;20-21;32;20-21;24;0;2;248;271;3;3;;
69;1;4;173390;5;0;0;5460;5464;3;3;;
24;7;4;173390;227;0;0;0;0;3;3;;
I need to replace the two last values in each row.
All rows ending with ;0;3;3;; should be replaced with ;0;17;18;; (the last one, solved)
The logic for the other two:
If the row contain a '-' it should replace the two last values from ;3;3;; to ;21;21;;
If it don´t have a '-' it should replace the two last values from ;3;3;; to ;22;22;;
This is my script
foreach ($file in Get-ChildItem *.*)
{
(Get-Content $file) -replace ';0;3;3;;',';;0;17;18;;' -replace ';3;3;;',';21;21;;' |Out-file -encoding ASCII $file-new}
If I could add a '-' in the end of each row continga a '-' I could solve the issue with a modified script:
(Get-Content $file) -replace ';0;3;3;;',';;0;17;18;;' -replace ';3;3;;-',';22;22;;' -replace ';3;3;;',';21;21;;'|Out-file -encoding ASCII $file-new}`
But how do I add a '-' in the end of a row, if the row contain a '-'?
Best Regards
Mr DXF
I tried with select-string, but I can´t figure it out...
if select-string -pattern '-' {append-text '-'|out-file -encoding ascii $file-new
else end
}
The following might do the trick, it uses a switch with the -Regex flag to read your files and match lines with regular expressions.
foreach ($file in Get-ChildItem *.* -File) {
& {
switch -Regex -File $file.FullName {
# if the line ends with `;3;3;;` but is not preceded by `;0`
'(?<!;0);3;3;;$' {
# if it contains a `-`
if($_.Contains('-')) {
$_ -replace ';3;3;;$', ';21;21;;'
continue
}
# if it doesn't contain a `-`
$_ -replace ';3;3;;$', ';22;22;;'
continue
}
# if the line ends with `';0;3;3;;`
';0;3;3;;$' {
$_ -replace ';0;3;3;;$', ';0;17;18;;'
continue
}
# if none of the above conditions are matched,
# output as is
Default { $_ }
}
} | Set-Content "$($file.BaseName)-new$($file.Extension)" -Encoding ascii
}
Using the content example in question the end result would become:
112;20-21;32;20-21;24;0;2;248;271;21;21;;
69;1;4;173390;5;0;0;5460;5464;22;22;;
24;7;4;173390;227;0;0;0;0;17;18;;
I have a Powershell script that scans log files and replaces text when a match is found. The list is currently 500 lines, and I plan to double/triple this. the log files can range from 400KB to 800MB in size.
Currently, when using the below, a 42MB file takes 29mins, and I'm looking for help if anyone can see any way to make this faster?
I tried changing ForEach-Object with ForEach-ObjectFast but it's causing the script to take sufficiently longer. also tried changing the first ForEach-Object to a forloop but still took ~29 mins.
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
'bbb:ccc:456'='WORDB:WORDBC:NUMBER456'
}
Get-Content -Path $inputfile | ForEach-Object {
$line=$_
$lookupTable.GetEnumerator() | ForEach-Object {
if ($line-match$_.Key)
{
$line=$line-replace$_.Key,$_.Value
}
}
$line
}|Set-Content -Path $outputfile
Since you say your input file could be 800MB in size, reading and updating the entire content in memory could potentially not fit.
The way to go then is to use a fast line-by-line method and the fastest I know of is switch
# hardcoded here for demo purposes.
# In real life you get/construct these from the Get-ChildItem
# cmdlet you use to iterate the log files in the root folder..
$inputfile = 'D:\Test\test.txt'
$outputfile = 'D:\Test\test_new.txt' # absolute full file path because we use .Net here
# because we are going to Append to the output file, make sure it doesn't exist yet
if (Test-Path -Path $outputfile -PathType Leaf) { Remove-Item -Path $outputfile -Force }
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
}
# create a regex string from the Keys of your lookup table,
# merging the strings with a pipe symbol (the regex 'OR').
# your Keys could contain characters that have special meaning in regex, so we need to escape those
$regexLookup = '({0})' -f (($lookupTable.Keys | ForEach-Object { [regex]::Escape($_) }) -join '|')
# create a StreamWriter object to write the lines to the new output file
# Note: use an ABSOLUTE full file path for this
$streamWriter = [System.IO.StreamWriter]::new($outputfile, $true) # $true for Append
switch -Regex -File $inputfile {
$regexLookup {
# do the replacement using the value in the lookup table.
# because in one line there may be multiple matches to replace
# get a System.Text.RegularExpressions.Match object to loop through all matches
$line = $_
$match = [regex]::Match($line, $regexLookup)
while ($match.Success) {
# because we escaped the keys, to find the correct entry we now need to unescape
$line = $line -replace $match.Value, $lookupTable[[regex]::Unescape($match.Value)]
$match = $match.NextMatch()
}
$streamWriter.WriteLine($line)
}
default { $streamWriter.WriteLine($_) } # write unchanged
}
# dispose of the StreamWriter object
$streamWriter.Dispose()
I have a large text file (output from SQL db) and I need to determine the row count. However, since the source SQL data itself contains carriage returns \r and line feeds \n (NEVER appearing together), the data for some rows spans multiple lines in the output .txt file. The Powershell I'm using below gives me the file line count which is greater than the actual SQL row count. So I need to modify the script to ignore the additional lines - one way of doing it might be just counting the number of times CRLF or \r\n occurs (TOGETHER) in the file and that should be the actual number of rows but I'm not sure how to do it.
Get-ChildItem "." |% {$n = $_; $c = 0; Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count }; "$n; $c"} > row_count.txt
I just learned myself that the Get-Content splits and streams each lines in a file by CR, CRLF, and LF sothat it can read data between operating systems interchangeably:
"1`r2`n3`r`n4" | Out-File .\Test.txt
(Get-Content .\Test.txt).Count
4
Reading the question again, I might have misunderstood your question.
In any case, if you want to split (count) on only a specific character combination:
CR
((Get-Content -Raw .\Test.txt).Trim() -Split '\r').Count
3
LF
((Get-Content -Raw .\Test.txt).Trim() -Split '\n').Count
3
CRLF
((Get-Content -Raw .\Test.txt).Trim() -Split '\r\n').Count # or: -Split [Environment]::NewLine
2
Note .Trim() method which removes the extra newline (white spaces) at the end of the file added by the Get-Content -Raw parameter.
Addendum
(Update based on the comment on the memory exception)
I am afraid that there is currently no other option then building your own StreamReader using the ReadBlock method and specifically split lines on a CRLF. I have opened a feature request for this issue: -NewLine Parameter to customize line separator for Get-Content
Get-Lines
A possible way to workaround the memory exception errors:
function Get-Lines {
[CmdletBinding()][OutputType([string])] param(
[Parameter(ValueFromPipeLine = $True)][string] $Filename,
[String] $NewLine = [Environment]::NewLine
)
Begin {
[Char[]] $Buffer = new-object Char[] 10
$Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList (Get-Item($Filename))
$Rest = '' # Note that a multiple character newline (as CRLF) could be split at the end of the buffer
}
Process {
While ($True) {
$Length = $Reader.ReadBlock($Buffer, 0, $Buffer.Length)
if (!$length) { Break }
$Split = ($Rest + [string]::new($Buffer[0..($Length - 1)])) -Split $NewLine
If ($Split.Count -gt 1) { $Split[0..($Split.Count - 2)] }
$Rest = $Split[-1]
}
}
End {
$Rest
}
}
Usage
To prevent the memory exceptions it is important that you do not assign the results to a variable or use brackets as this will stall the PowerShell PowerShell pipeline and store everything in memory.
$Count = 0
Get-Lines .\Test.txt | ForEach-Object { $Count++ }
$Count
The System.IO.StreamReader.ReadBlock solution that reads the file in fixed-size blocks and performs custom splitting into lines in iRon's helpful answer is the best choice, because it both avoids out-of-memory problems and performs well (by PowerShell standards).
If performance in terms of execution speed isn't paramount, you can take advantage of
Get-Content's -Delimiter parameter, which accepts a custom string to split the file content by:
# Outputs the count of CRLF-terminated lines.
(Get-Content largeFile.txt -Delimiter "`r`n" | Measure-Object).Count
Note that -Delimiter employs optional-terminator logic when splitting: that is, if the file content ends in the given delimiter string, no extra, empty element is reported at the end.
This is consistent with the default behavior, where a trailing newline in a file is considered an optional terminator that does not resulting in an additional, empty line getting reported.
However, in case a -Delimiter string that is unrelated to newline characters is used, a trailing newline is considered a final "line" (element).
A quick example:
# Create a test file without a trailing newline.
# Note the CR-only newline (`r) after 'line 1'
"line1`rrest of line1`r`nline2" | Set-Content -NoNewLine test1.txt
# Create another test file with the same content plus
# a trailing CRLF newline.
"line1`rrest of line1`r`nline2`r`n" | Set-Content -NoNewLine test2.txt
'test1.txt', 'test2.txt' | ForEach-Object {
"--- $_"
# Split by CRLF only and enclose the resulting lines in [...]
Get-Content $_ -Delimiter "`r`n" |
ForEach-Object { "[{0}]" -f ($_ -replace "`r", '`r') }
}
This yields:
--- test1.txt
[line1`rrest of line1]
[line2]
--- test2.txt
[line1`rrest of line1]
[line2]
As you can see, the two test files were processed identically, because the trailing CRLF newline was considered an optional terminator for the last line.
I am reading in line by line of a text file. If I see a specific string, I locate the first and last of a specific character, use two substrings to create a smaller string, then replace the line in the text file.
The difficult part: I have the line of text stored in a variable, but cannot figure out how to write this new line over the old line in the text document.
Excuse the crude code - I have been testing things and only started playing with PowerShell a few hours ago.
foreach($line in [System.IO.File]::ReadLines("C:\BatchPractice\test.txt"))
{
Write-Output $line
if ($line.Contains("dsa")) {
Write-Output "TRUEEEEE"
}
$positionF = $line.IndexOf("\")+1
$positionL = $line.LastIndexOf("\")+1
$lengthT = $line.Length
Write-Output ($positionF)
Write-Output $positionL
Write-Output $lengthT
if($line.Contains("\")){
Write-Output "Start"
$combine = $line.Substring(0,$positionF-1) + $line.Substring($postionL,($lengthT-$positionL))
Write-Output $combine
$line1 = $line.Substring(0,$positionF-1)
$line2 = $line.Substring($positionL,($lengthT-$positionL))
$combined = $line1 + $line2
Write-Output $combined
Write-Output "Close"
}
}```
You can save the file as arrays in Get-Content and Set-Content:
$file=(Get-Content "C:\BatchPractice\test.txt")
Then you can edit it like arrays:
$file[LINE_NUMBER]="New line"
Where LINE_NUMBER is the line number starting from 0.
And then overwrite to file:
$file|Set-Content "C:\BatchPractice\test.txt"
You can implement this in code. Create a variable $i=0 and increment it at the end of loop. Here $i will be the line number at each iteration.
HTH
Based on your code it seems you want to take any line that contains 'dsa' and remove the contents after the first backslash up until the last backslash. If that's the case I'd recommend simplifying your code with regex. First I made a sample file since none was provided.
$tempfile = New-TemporaryFile
#'
abc\def\ghi\jkl
abc\dsa\ghi\jkl
zyx\vut\dsa\srq
zyx\vut\srq\pon
'# | Set-Content $tempfile -Encoding UTF8
Now we will read in all lines (unless this is a massive file)
$text = Get-Content $tempfile -Encoding UTF8
Next we'll make a regex object with the pattern we want to replace. The double backslash is to escape the backslash since it has meaning to regex.
$regex = [regex]'(?<=.+\\).+\\'
Now we will loop over every line, if it has dsa in it we will run the replace against it, otherwise we will output the line.
$text | ForEach-Object {
if($_.contains('dsa'))
{
$regex.Replace($_,'')
}
else
{
$_
}
} -OutVariable newtext
You'll see the output on the screen but it's also capture in $newtext variable. I recommend ensuring it is the output you are after prior to writing.
abc\def\ghi\jkl
abc\jkl
zyx\srq
zyx\vut\srq\pon
Once confirmed, simply write it back to the file.
$newtext | Set-Content $tempfile -Encoding UTF8
You can obviously combine the steps as well.
$text | ForEach-Object {
if($_.contains('dsa'))
{
$regex.Replace($_,'')
}
else
{
$_
}
} | Set-Content $tempfile -Encoding UTF8
I want to add a line to a series of cfg files in subfolders after a variable line.
some text ...
light.0 = some text
light.1 = some text
...
light.n = some text
... some text
Each text file has a varied nth data line.
All I want to add is the (n+1)th data line after those nth lines in each cfg files in subfolders.
light.(n+1) = some text
I want to carry out this task in PowerShell.
# Get all the config files, and loop over them
Get-ChildItem "d:\test" -Recurse -Include *.cfg | ForEach-Object {
# Output a progress message
Write-Host "Processing file: $_"
# Make a backup copy of the file, forcibly overwriting one if it's there
Copy-Item -LiteralPath $_ -Destination "$_+.bak" -Force
# Read the lines in the file
$Content = Get-Content -LiteralPath $_ -Raw
# A regex which matches the last "light..." line
# - line beginning with light.
# - with a number next (capture the number)
# - then equals, text up to the end of the line
# - newline characters
# - not followed by another line beginning with light
$Regex = '^light.(?<num>\d+) =.*?$(?![\r\n]+^light)'
# A scriptblock to calculate the regex replacement
# needs to output the line which was captured
# and calculat the increased number
# and output the new line as well
$ReplacementCalculator = {
param($RegexMatches)
$LastLine = $RegexMatches[0].Value
$Number = [int]$RegexMatches.groups['num'].value
$NewNumber = $Number + 1
"$LastLine`nlight.$NewNumber = some new text"
}
# Do the replacement and insert the new line
$Content = [regex]::Replace($Content, $Regex, $ReplacementCalculator, 'Multiline')
# Update the file with the new content
$Content | Set-Content -Path $_
}
(* I'm sure I read that somewhere)
Assumes the 'light' lines are contiguous with no blocks of other text in the middle, and that they are ordered, with the highest number being last. You might have to play with the line endings \r\n in the regex and `n in the replacement text, to make them match up.
(Sorry about the regex)