I try to delete carriage return in a file, all lines have a "}" at end, but some lines, have a return carriage and the "}" its down.... Here an example of my code and file, but doesn't work. Thank You Very Much!!
Original File
The line with "INGENIERIA Y SERVICIOS LOS NOGALES SPA" it's the conflict.
Expected Result
Code:
$InputFile='Original.txt'
$OutPutFile='NewFile.txt'
(Get-Content $InputFile) | ForEach-Object -Begin {
$results = #()
} -Process {
$out = $.Split("}").GetUpperBound(0)
[int]$iOut = [int]$out
if($iOut -lt 1){$.Replace("`r`n","")}else{$_}
} | Set-Content $OutPutFile
Thanks!
You may do the following:
(Get-Content $InputFile -Raw) -replace '(?<!})\r\n' |
Set-Content $outputfile
Reading a file using Get-Content without the -Raw or -ReadCount parameters, reads each line and outputs it as an array element. A line is output once its delimiter has been found. The default delimiter for Get-Content is the end of line character. -Raw allows newline characters to be preserved and the file is read as a single string.
(?<!}) is a negative lookbehind matching only when the preceding character is not a }. Then we simply match \r\n for CRLF.
Related
I have a large text file (output from SQL db) and I need to determine the row count. However, since the source SQL data itself contains carriage returns \r and line feeds \n (NEVER appearing together), the data for some rows spans multiple lines in the output .txt file. The Powershell I'm using below gives me the file line count which is greater than the actual SQL row count. So I need to modify the script to ignore the additional lines - one way of doing it might be just counting the number of times CRLF or \r\n occurs (TOGETHER) in the file and that should be the actual number of rows but I'm not sure how to do it.
Get-ChildItem "." |% {$n = $_; $c = 0; Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count }; "$n; $c"} > row_count.txt
I just learned myself that the Get-Content splits and streams each lines in a file by CR, CRLF, and LF sothat it can read data between operating systems interchangeably:
"1`r2`n3`r`n4" | Out-File .\Test.txt
(Get-Content .\Test.txt).Count
4
Reading the question again, I might have misunderstood your question.
In any case, if you want to split (count) on only a specific character combination:
CR
((Get-Content -Raw .\Test.txt).Trim() -Split '\r').Count
3
LF
((Get-Content -Raw .\Test.txt).Trim() -Split '\n').Count
3
CRLF
((Get-Content -Raw .\Test.txt).Trim() -Split '\r\n').Count # or: -Split [Environment]::NewLine
2
Note .Trim() method which removes the extra newline (white spaces) at the end of the file added by the Get-Content -Raw parameter.
Addendum
(Update based on the comment on the memory exception)
I am afraid that there is currently no other option then building your own StreamReader using the ReadBlock method and specifically split lines on a CRLF. I have opened a feature request for this issue: -NewLine Parameter to customize line separator for Get-Content
Get-Lines
A possible way to workaround the memory exception errors:
function Get-Lines {
[CmdletBinding()][OutputType([string])] param(
[Parameter(ValueFromPipeLine = $True)][string] $Filename,
[String] $NewLine = [Environment]::NewLine
)
Begin {
[Char[]] $Buffer = new-object Char[] 10
$Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList (Get-Item($Filename))
$Rest = '' # Note that a multiple character newline (as CRLF) could be split at the end of the buffer
}
Process {
While ($True) {
$Length = $Reader.ReadBlock($Buffer, 0, $Buffer.Length)
if (!$length) { Break }
$Split = ($Rest + [string]::new($Buffer[0..($Length - 1)])) -Split $NewLine
If ($Split.Count -gt 1) { $Split[0..($Split.Count - 2)] }
$Rest = $Split[-1]
}
}
End {
$Rest
}
}
Usage
To prevent the memory exceptions it is important that you do not assign the results to a variable or use brackets as this will stall the PowerShell PowerShell pipeline and store everything in memory.
$Count = 0
Get-Lines .\Test.txt | ForEach-Object { $Count++ }
$Count
The System.IO.StreamReader.ReadBlock solution that reads the file in fixed-size blocks and performs custom splitting into lines in iRon's helpful answer is the best choice, because it both avoids out-of-memory problems and performs well (by PowerShell standards).
If performance in terms of execution speed isn't paramount, you can take advantage of
Get-Content's -Delimiter parameter, which accepts a custom string to split the file content by:
# Outputs the count of CRLF-terminated lines.
(Get-Content largeFile.txt -Delimiter "`r`n" | Measure-Object).Count
Note that -Delimiter employs optional-terminator logic when splitting: that is, if the file content ends in the given delimiter string, no extra, empty element is reported at the end.
This is consistent with the default behavior, where a trailing newline in a file is considered an optional terminator that does not resulting in an additional, empty line getting reported.
However, in case a -Delimiter string that is unrelated to newline characters is used, a trailing newline is considered a final "line" (element).
A quick example:
# Create a test file without a trailing newline.
# Note the CR-only newline (`r) after 'line 1'
"line1`rrest of line1`r`nline2" | Set-Content -NoNewLine test1.txt
# Create another test file with the same content plus
# a trailing CRLF newline.
"line1`rrest of line1`r`nline2`r`n" | Set-Content -NoNewLine test2.txt
'test1.txt', 'test2.txt' | ForEach-Object {
"--- $_"
# Split by CRLF only and enclose the resulting lines in [...]
Get-Content $_ -Delimiter "`r`n" |
ForEach-Object { "[{0}]" -f ($_ -replace "`r", '`r') }
}
This yields:
--- test1.txt
[line1`rrest of line1]
[line2]
--- test2.txt
[line1`rrest of line1]
[line2]
As you can see, the two test files were processed identically, because the trailing CRLF newline was considered an optional terminator for the last line.
Good day,
with the script below I would like to use the following input txt from my output txt.
Input:
Klaus;Müller;Straße;PLZ;Ort;;;;;DE12345;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE12345678;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE999999;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE7777777;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE7777779;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE777777987;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE7777779765;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE77777797634;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE7777779763465;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE77777797623435435;
Output:
Klaus;Müller;Straße;PLZ;Ort;;;;;DE12345;DE12345678;DE999999;DE7777777;DE7777779;DE777777987;DE7777779765;DE77777797634;DE7777779763465;DE77777797623435435;
The script takes the last value from the following lines and appends them to the first line at the end and adds semicolons:
Import-Csv input.txt -delimiter ";" -Header (1..20)
1..9 | %{$data[0].($_+10) = $data[$_].10}
($data[0] | convertto-csv -delimiter ";" -NoType | select -skip 1) -replace '"' | out-file output.txt
gc test_neu.txt
if i save this into a .ps1 file it doesn´t work. anyone could say me why?
You don't assign Import-Csv to anything. The first line should be: $data = Import-Csv input.txt -delimiter ";" -Header (1..20) Your last line should be gc output.txt. And use the dot notation to location the input.txt file in the current directory. With these fixes, your script works:
$data = Import-Csv .\input.txt -delimiter ";" -Header (1..20)
1..9 | %{$data[0].($_+10) = $data[$_].10}
($data[0] | convertto-csv -delimiter ";" -NoType | select -skip 1) -replace '"' | out-file output.txt
gc output.txt
this seems to do what you want. [grin] it expects that the source lines are all to be combined.
i presume you can handle saving things to a file, so i leave that to you.
what it does ...
fakes reading in a text file
when ready to work with real data, replace the entire #region/#endregion block with a call to Get-Content.
iterates thru the collection by index number
if the line is the 1st, set $NewString to that entire value
else, add the last data item of the line to the existing $NewString value with a trailing ;
the .Where({$_}) filters out any blank items.
display the string
the code ...
#region >>> fake reading in a text file
# in real life, use Get-Content
$InStuff = #'
Klaus;Müller;Straße;PLZ;Ort;;;;;DE12345;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE12345678;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE999999;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE7777777;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE7777779;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE777777987;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE7777779765;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE77777797634;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE7777779763465;
Klaus;Müller;Straße;PLZ;Ort;;;;;DE77777797623435435;
'# -split [System.Environment]::NewLine
#endregion >>> fake reading in a text file
foreach ($Index in 0..$InStuff.GetUpperBound(0))
{
if ($Index -eq 0)
{
$NewString = $InStuff[$Index]
}
else
{
$NewString += $InStuff[$Index].Split(';').Where({$_})[-1] + ';'
}
}
$NewString
output ...
Klaus;Müller;Straße;PLZ;Ort;;;;;DE12345;DE12345678;DE999999;DE7777777;DE7777779;DE777777987;DE7777779765;DE77777797634;DE7777779763465;DE77777797623435435;
Just in case you don't know how many lines there are going to be on the input file:
$fmt='$1$2'
gc .\input.txt | %{$_ -replace '(^.*;)(.*;$)',$fmt;$fmt='$2'} | sc output.txt -NoNewline
gc output.txt
I have a log file with ^M embedded throughout. I would like to replace the ^M with a single space.
I have tried variations on this:
(Get-Content C:\temp\send.log) | Foreach-Object {$_ -replace "^M", ' '} | Set-Content C:\temp\send.out
The output file contains a newline where each ^M had been, not at all what I was looking for...
The problem I am trying to solve involves examining the last $cnt lines of the file:
$new = Get-Content $fn | Select-Object -Last $cnt;
$new
When I display $new, the ^M are interpreted as CR/LF.
How can I remove/replace the ^M? Thanks for any pointers....
Sounds like ^M is not being replaced by your -replace method, it's likely the replace method is trying to replace capital letter M at the beginning of the string (^). Upon opening the file, ^M is then being interpreted as a carriage return.
Perhaps try replacing the carriage returns (^M) before displaying the contents:
(Get-Content C:\temp\send.log) |
Foreach-Object {$_ -replace "`r", ' '} |
Set-Content C:\temp\send.out
or
$new = Get-Content $fn | Select-Object -Last $cnt;
$new.replace("`r"," ")
Could this be as simple as escaping the ^ character? If you only need the last $count lines of the file you can use the -Tail parameter on Get-Content. Depending if you need to match ^M as case sensitive you might opt for -creplace instead of -replace.
Get-Content $inputfile -Tail $count | ForEach-Object { $_ -creplace '\^m',' ' } | Set-Content $outputfile
This isn't an answer, but since you asked for a few pointers, this might help set things straight.
Try this:
$new = Get-Content $fn | Select-Object -Last $cnt;
$new
$new.gettype()
$new[0].gettype()
I expect you're going to see that $new is an array of objects, and that $new[0] is a string. I'm going to suggest that $new[0] doesn't contain CR or LF or CRLF or anything like that. And I'm going to suggest that, when you ask for the display of $new in its entirety, what you are getting is each string ($new[0] followed by $new[1] ...) with CRLF inserted as a separator.
If I'm right, replacing CR or CRLF with space isn't going to do you any good at all. It's the CRLFs that are being inserted on output to a file that are preventing you from succeeding.
This is as far as I got towards solving your problem.
I'm trying to write a script to find all the periods in the first 11 characters or last 147 characters of each line (lines are fixed width of 193, so I'm attempting to ignore characters 12 through 45).
First I want a script that will just find all the periods from the first or last part of each line, but then if I find them I would like to replace all periods with 0's, but ignore periods on the 12th through 45th line and leaving those in place. It would scan all the *.dat files in the directory and create period free copies in a subfolder. So far I have:
$data = get-content "*.dat"
foreach($line in $data)
{
$line.substring(0,12)
$line.substring(46,147)
}
Then I run this with > Output.txt then do a select-string Output.txt -pattern ".". As you can see I'm a long ways from my goal as presently my program is mashing all the files together, and I haven't figured out how to do any replacement yet.
Get-Item *.dat |
ForEach-Object {
$file = $_
$_ |
Get-Content |
ForEach-Object {
$beginning = $_.Substring(0,12) -replace '\.','0'
$middle = $_.Substring(12,44)
$end = $_.Substring(45,147) -replace '\.','0'
'{0}{1}{2}' -f $beginning,$middle,$end
} |
Set-Content -Path (Join-Path $OutputDir $file.Name)
}
You can use the powershell -replace operator to replace the "." with "0". Then use substring as you do to build up the three portions of the string you're interested in to get the updated string. This will output an updated line for each line of your input.
$data = get-content "*.dat"
foreach($line in $data)
{
($line.SubString(0,12) -replace "\.","0") + $line.SubString(13,34) + ($line.substring(46,147) -replace "\.","0")
}
Note that the -replace operator performs a regular expression match and the "." is a special regular expression character so you need to escape it with a "\".
Iam trying to replace following string with PowerShell:
...
("
Intel(R) Network Connections 14.2.100.0
","
14.2.100.0
")
...
The code that I use is:
Get-Content $logfilepath |
Foreach-Object { $_ -replace '`r`n`r`n', 'xx'} |
Set-Content $logfilepath_new
But I have no success, can someone say me, where the error is?
First, you are using single quotes in the replace string -
'`r`n`r`n'
that means they are treated verbatim and not as newline characters, so you have to use -
"`r`n`r`n"
To replace, read the file as string and use the Replace() method
$content=[string] $template= [System.IO.File]::ReadAllText("test.txt")
$content.Replace("`r`n`r`n","xx")
Get-content returns an array of lines, so CRLF is essentially your delimiter. Two CRLF sequences back to back would be interpreted as the end of the currrent line, followed by a null line, so no line (object) should contain '`r`n`r`n'. A multi-line regex replace would probably be a better choice.
as alternate method using PS cmdlets:
Get-Content $logfilepath |
Foreach-Object -Begin { $content="" } -Process { $content += $_ ; $content += "xx" } -End { $content } |
Set-Content $logfilepath_new
I used the following code to replace somestring with newline:
$nl = [System.Environment]::NewLine
$content = $content.Replace( somestring, $nl )