Remove Top Line of Text File with PowerShell - powershell

I am trying to just remove the first line of about 5000 text files before importing them.
I am still very new to PowerShell so not sure what to search for or how to approach this. My current concept using pseudo-code:
set-content file (get-content unless line contains amount)
However, I can't seem to figure out how to do something like contains.

While I really admire the answer from #hoge both for a very concise technique and a wrapper function to generalize it and I encourage upvotes for it, I am compelled to comment on the other two answers that use temp files (it gnaws at me like fingernails on a chalkboard!).
Assuming the file is not huge, you can force the pipeline to operate in discrete sections--thereby obviating the need for a temp file--with judicious use of parentheses:
(Get-Content $file | Select-Object -Skip 1) | Set-Content $file
... or in short form:
(gc $file | select -Skip 1) | sc $file

It is not the most efficient in the world, but this should work:
get-content $file |
select -Skip 1 |
set-content "$file-temp"
move "$file-temp" $file -Force

Using variable notation, you can do it without a temporary file:
${C:\file.txt} = ${C:\file.txt} | select -skip 1
function Remove-Topline ( [string[]]$path, [int]$skip=1 ) {
if ( -not (Test-Path $path -PathType Leaf) ) {
throw "invalid filename"
}
ls $path |
% { iex "`${$($_.fullname)} = `${$($_.fullname)} | select -skip $skip" }
}

I just had to do the same task, and gc | select ... | sc took over 4 GB of RAM on my machine while reading a 1.6 GB file. It didn't finish for at least 20 minutes after reading the whole file in (as reported by Read Bytes in Process Explorer), at which point I had to kill it.
My solution was to use a more .NET approach: StreamReader + StreamWriter.
See this answer for a great answer discussing the perf: In Powershell, what's the most efficient way to split a large text file by record type?
Below is my solution. Yes, it uses a temporary file, but in my case, it didn't matter (it was a freaking huge SQL table creation and insert statements file):
PS> (measure-command{
$i = 0
$ins = New-Object System.IO.StreamReader "in/file/pa.th"
$outs = New-Object System.IO.StreamWriter "out/file/pa.th"
while( !$ins.EndOfStream ) {
$line = $ins.ReadLine();
if( $i -ne 0 ) {
$outs.WriteLine($line);
}
$i = $i+1;
}
$outs.Close();
$ins.Close();
}).TotalSeconds
It returned:
188.1224443

Inspired by AASoft's answer, I went out to improve it a bit more:
Avoid the loop variable $i and the comparison with 0 in every loop
Wrap the execution into a try..finally block to always close the files in use
Make the solution work for an arbitrary number of lines to remove from the beginning of the file
Use a variable $p to reference the current directory
These changes lead to the following code:
$p = (Get-Location).Path
(Measure-Command {
# Number of lines to skip
$skip = 1
$ins = New-Object System.IO.StreamReader ($p + "\test.log")
$outs = New-Object System.IO.StreamWriter ($p + "\test-1.log")
try {
# Skip the first N lines, but allow for fewer than N, as well
for( $s = 1; $s -le $skip -and !$ins.EndOfStream; $s++ ) {
$ins.ReadLine()
}
while( !$ins.EndOfStream ) {
$outs.WriteLine( $ins.ReadLine() )
}
}
finally {
$outs.Close()
$ins.Close()
}
}).TotalSeconds
The first change brought the processing time for my 60 MB file down from 5.3s to 4s. The rest of the changes is more cosmetic.

$x = get-content $file
$x[1..$x.count] | set-content $file
Just that much. Long boring explanation follows. Get-content returns an array. We can "index into" array variables, as demonstrated in this and other Scripting Guys posts.
For example, if we define an array variable like this,
$array = #("first item","second item","third item")
so $array returns
first item
second item
third item
then we can "index into" that array to retrieve only its 1st element
$array[0]
or only its 2nd
$array[1]
or a range of index values from the 2nd through the last.
$array[1..$array.count]

I just learned from a website:
Get-ChildItem *.txt | ForEach-Object { (get-Content $_) | Where-Object {(1) -notcontains $_.ReadCount } | Set-Content -path $_ }
Or you can use the aliases to make it short, like:
gci *.txt | % { (gc $_) | ? { (1) -notcontains $_.ReadCount } | sc -path $_ }

Another approach to remove the first line from file, using multiple assignment technique. Refer Link
$firstLine, $restOfDocument = Get-Content -Path $filename
$modifiedContent = $restOfDocument
$modifiedContent | Out-String | Set-Content $filename

skip` didn't work, so my workaround is
$LinesCount = $(get-content $file).Count
get-content $file |
select -Last $($LinesCount-1) |
set-content "$file-temp"
move "$file-temp" $file -Force

Following on from Michael Soren's answer.
If you want to edit all .txt files in the current directory and remove the first line from each.
Get-ChildItem (Get-Location).Path -Filter *.txt |
Foreach-Object {
(Get-Content $_.FullName | Select-Object -Skip 1) | Set-Content $_.FullName
}

For smaller files you could use this:
& C:\windows\system32\more +1 oldfile.csv > newfile.csv | out-null
... but it's not very effective at processing my example file of 16MB. It doesn't seem to terminate and release the lock on newfile.csv.

Related

powershell: delete specific line from x to x

I'm new in powershell and I absolutely dont get it ...
Just want to delete line 7 to 2500 of a text file. First 6 lines should be untouched.
With linux bash everything is so easy, just:
sed -i '7,2500d' $file
Did not find any solution for mighty powershell :-(
Thank you.
Use Get-Content to read the contents of the file into a variable. The variable can be indexed like a regular PowerShell array. Get the parts of the array you need then pipe the variable into Set-Content to write back to the file.
$file = Get-Content test.log
$keep = $file[0..1] + $file[7..($file.Count - 1)]
$keep | Set-Content test.log
Using this as the contents of the file test.log:
One
Two
Three
Four
Five
Six
Seven
Eight
Nine
This script will output the following into test.log (overwriting the contents):
One
Two
Eight
Nine
In your case, you will want to use $file[0..5] + $file[2500..($file.Count - 1)].
To remove a series of lines in a text file, you could do something like this:
$fileIn = 'D:\Test\File1.txt'
$fileOut = 'D:\Test\File2.txt'
$startRemove = 7
$endRemove = 2500
$currentLine = 1
# needs .NET 4
$newText = foreach ($line in [System.IO.File]::ReadLines($fileIn)) {
if ($currentLine -lt $startRemove -or $currentLine -gt $endRemove) { $line}
$currentLine++
}
$newText | Set-Content -Path $fileOut -Force
Or, if your version of .NET is below 4.0
$reader = [System.IO.File]::OpenText($fileIn)
$newText = while($null -ne ($line = $reader.ReadLine())) {
if ($currentLine -lt $startRemove -or $currentLine -gt $endRemove) { $line }
$currentLine++
}
$reader.Dispose()
$newText | Set-Content -Path $fileOut -Force
Select-object -index takes an array, so:
1..10 > file
(get-content file) | select -index (0..5) | set-content file
get-content file
1
2
3
4
5
6
Or:
(cat file)[0..5] | set-content file

How to modify contents of a pipe-delimited text file with PowerShell

I have a pipe-delimited text file. The file contains "records" of various types. I want to modify certain columns for each record type. For simplicity, let's say there are 3 record types: A, B, and C. A has 3 columns, B has 4 columns, and C has 5 columns. For example, we have:
A|stuff|more_stuff
B|123|other|x
C|something|456|stuff|more_stuff
B|78903|stuff|x
A|1|more_stuff
I want to append the prefix "P" to all desired columns. For A, the desired column is 2. For B, the desired column is 3. For C, the desired column is 4.
So, I want the output to look like:
A|Pstuff|more_stuff
B|123|Pother|x
C|something|456|Pstuff|more_stuff
B|78903|Pstuff|x
A|P1|more_stuff
I need to do this in PowerShell. The file could be very large. So, I'm thinking about going with the File-class of .NET. If it were a simple string replacement, I would do something like:
$content = [System.IO.File]::ReadAllText("H:\test_modify_contents.txt").Replace("replace_text","something_else")
[System.IO.File]::WriteAllText("H:\output_file.txt", $content)
But, it's not so simple in my particular situation. So, I'm not even sure if ReadAllText and WriteAllText is the best solution. Any ideas on how to do this?
I would ConvertFrom-Csv so you can check each line as an object. On this code, I did add a header, but mainly for code readability. The header is cut out of the output on the last line anyway:
$input = "H:\test_modify_contents.txt"
$output = "H:\output_file.txt"
$data = Get-Content -Path $input | ConvertFrom-Csv -Delimiter '|' -Header 'Column1','Column2','Column3','Column4','Column5'
$data | % {
If ($_.Column5) {
#type C:
$_.Column4 = "P$($_.Column4)"
} ElseIf ($_.Column4) {
#type B:
$_.Column3 = "P$($_.Column3)"
} Else {
#type A:
$_.Column2 = "P$($_.Column2)"
}
}
$data | Select Column1,Column2,Column3,Column4,Column5 | ConvertTo-Csv -Delimiter '|' -NoTypeInformation | Select-Object -Skip 1 | Set-Content -Path $output
It does add extra | for the type A and B lines. Output:
"A"|"Pstuff"|"more_stuff"||
"B"|"123"|"Pother"|"x"|
"C"|"something"|"456"|"Pstuff"|"more_stuff"
"B"|"78903"|"Pstuff"|"x"|
"A"|"P1"|"more_stuff"||
If your file sizes are large then reading the complete file contents at once using Import-Csv or ReadAll is probably not a good idea. I would use Get-Content cmdlet using the ReadCount property which will stream the file one row at time and then use a regex for the processing. Something like this:
Get-Content your_in_file.txt -ReadCount 1 | % {
$_ -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'
} | Set-Content your_out_file.txt
EDIT:
This version should output faster:
$d = Get-Date
Get-Content input.txt -ReadCount 1000 | % {
$_ | % {
$_ -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'
} | Add-Content output.txt
}
(New-TimeSpan $d (Get-Date)).Milliseconds
For me this processed 50k rows in 350 milliseconds. You probably get more speed by tweaking the -ReadCount value to find the ideal amount.
Given the large input file, i would not use either ReadAllText or Get-Content.
They actually read the entire file into memory.
Consider using something along the lines of
$filename = ".\input2.csv"
$outfilename = ".\output2.csv"
function ProcessFile($inputfilename, $outputfilename)
{
$reader = [System.IO.File]::OpenText($inputfilename)
$writer = New-Object System.IO.StreamWriter $outputfilename
$record = $reader.ReadLine()
while ($record -ne $null)
{
$writer.WriteLine(($record -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'))
$record = $reader.ReadLine()
}
$reader.Close()
$reader.Dispose()
$writer.Close()
$writer.Dispose()
}
ProcessFile $filename $outfilename
EDIT: After testing all the suggestions on this page, i have borrowed the regex from Dave Sexton and this is the fastest implementation. Processes a 1gb+ file in 175 seconds. All other implementations are significantly slower on large input files.

Powershell - reading ahead and While

I have a text file in the following format:
.....
ENTRY,PartNumber1,,,
FIELD,IntCode,123456
...
FIELD,MFRPartNumber,ABC123,,,
...
FIELD,XPARTNUMBER,ABC123
...
FIELD,InternalPartNumber,3214567
...
ENTRY,PartNumber2,,,
...
...
the ... indicates there is other data between these fields. The ONLY thing I can be certain of is that the field starting with ENTRY is a new set of records. The rows starting with FIELD can be in any order, and not all of them may be present in each group of data.
I need to read in a chunk of data
Search for any field matching the
string ABC123
If ABC123 found, search for the existence of the
InternalPartNumber field & return that row of data.
I have not seen a way to use Get-Content that can read in a variable number of rows as a set & be able to search it.
Here is the code I currently have, which will read a file, searching for a string & replacing it with another. I hope this can be modified to be used in this case.
$ftype = "*.txt"
$fnames = gci -Path $filefolder1 -Filter $ftype -Recurse|% {$_.FullName}
$mfgPartlist = Import-Csv -Path "C:\test\mfrPartList.csv"
foreach ($file in $fnames) {
$contents = Get-Content -Path $file
foreach ($partnbr in $mfgPartlist) {
$oldString = $mfgPartlist.OldValue
$newString = $mfgPartlist.NewValue
if (Select-String -Path $file -SimpleMatch $oldString -Debug -Quiet) {
$stringData = $contents -imatch $oldString
$stringData = $stringData -replace "[\n\r]","|"
foreach ($dataline in $stringData) {
$file +"|"+$stringData+"|"+$oldString+"|"+$newString|Out-File "C:\test\Datachanges.txt" -Width 2000 -Append
}
$contents = $contents -replace $oldString $newString
Set-Content -Path $file -Value $contents
}
}
}
Is there a way to read & search a text file in "chunks" using Powershell? Or to do a Read-ahead & determine what to search?
Assuming your fine isn't too big to read into memory all at once:
$Text = Get-Content testfile.txt -Raw
($Text -split '(?ms)^(?=ENTRY)') |
foreach {
if ($_ -match '(?ms)^FIELD\S+ABC123')
{$_ -replace '(?ms).+(^Field\S+InternalPartNumber.+?$).+','$1'}
}
FIELD,InternalPartNumber,3214567
That reads the entire file in as a single multiline string, and then splits it at the beginning of any line that starts with 'ENTRY'. Then it tests each segment for a FIELD line that contains 'ABC123', and if it does, removes everything except the FIELD line for the InternalPartNumber.
This is not my best work as I have just got back from vacation. You could use a while loop reading the text and set an entry flag to gobble up the text in chunks. However if your files are not too big then you could just read up the text file at once and use regex to split up the chunks and then process accordingly.
$pattern = "ABC123"
$matchedRowToReturn = "InternalPartNumber"
$fileData = Get-Content "d:\temp\test.txt" | Where-Object{$_ -match '^(entry|field)'} | Out-String
$parts = $fileData | Select-String '(?smi)(^Entry).*?(?=^Entry|\Z)' -AllMatches | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value
$parts | Where-Object{$_ -match $pattern} | Select-String "$matchedRowToReturn.*$" | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value
What this will do is read in the text file, drop any lines that are not entry or field related, as one long string and split it up into chunks that start with lines that begin with the work "Entry".
Then we drop those "parts" that do not contain the $pattern. Of the remaining that match extract the InternalPartNumber line and present.

Get all lines containing a string in a huge text file - as fast as possible?

In Powershell, how to read and get as fast as possible the last line (or all the lines) which contains a specific string in a huge text file (about 200000 lines / 30 MBytes) ?
I'm using :
get-content myfile.txt | select-string -pattern "my_string" -encoding ASCII | select -last 1
But it's very very long (about 16-18 seconds).
I did tests without the last pipe "select -last 1", but it's the same time.
Is there a faster way to get the last occurence (or all occurences) of a specific string in huge file?
Perhaps it's the needed time ...
Or it there any possiblity to read the file faster from the end as I want the last occurence?
Thanks
Try this:
get-content myfile.txt -ReadCount 1000 |
foreach { $_ -match "my_string" }
That will read your file in chunks of 1000 records at a time, and find the matches in each chunk. This gives you better performance because you aren't wasting a lot of cpu time on memory management, since there's only 1000 lines at a time in the pipeline.
Have you tried:
gc myfile.txt | % { if($_ -match "my_string") {write-host $_}}
Or, you can create a "grep"-like function:
function grep($f,$s) {
gc $f | % {if($_ -match $s){write-host $_}}
}
Then you can just issue: grep $myfile.txt $my_string
$reader = New-Object System.IO.StreamReader("myfile.txt")
$lines = #()
if ($reader -ne $null) {
while (!$reader.EndOfStream) {
$line = $reader.ReadLine()
if ($line.Contains("my_string")) {
$lines += $line
}
}
}
$lines | Select-Object -Last 1
Have you tried using [System.IO.File]::ReadAllLines();? This method is more "raw" than the PowerShell-esque method, since we're plugging directly into the Microsoft .NET Framework types.
$Lines = [System.IO.File]::ReadAllLines();
[Regex]::Matches($Lines, 'my_string_pattern');
I wanted to extract the lines that contained failed and also write this lines to a new file, I will add the full command for this
get-content log.txt -ReadCount 1000 |
>> foreach { $_ -match "failed" } | Out-File C:\failes.txt

How can I search the first line and the last line in a text file?

I need to only search the 1st line and last line in a text file to find a "-" and remove it.
How can I do it?
I tried select-string, but I don't know to find the 1st and last line and only remove "-" from there.
Here is what the text file looks like:
% 01-A247M15 G70
N0001 G30 G17 X-100 Y-100 Z0
N0002 G31 G90 X100 Y100 Z45
N0003 ; --PART NO.: NC-HON.PHX01.COVER-SHOE.DET-1000.050
N0004 ; --TOOL: 8.55 X .3937
N0005 ;
N0006 % 01-A247M15 G70
Something like this?
$1 = Get-Content C:\work\test\01.I
$1 | select-object -index 0, ($1.count-1)
Ok, so after looking at this for a while, I decided there had to be a way to do this with a one liner. Here it is:
(gc "c:\myfile.txt") | % -Begin {$test = (gc "c:\myfile.txt" | select -first 1 -last 1)} -Process {if ( $_ -eq $test[0] -or $_ -eq $test[-1] ) { $_ -replace "-" } else { $_ }} | Set-Content "c:\myfile.txt"
Here is a breakdown of what this is doing:
First, the aliases for those now familiar. I only put them in because the command is long enough as it is, so this helps keep things manageable:
gc means Get-Content
% means Foreach
$_ is for the current pipeline value (this isn't an alias, but I thought I would define it since you said you were new)
Ok, now here is what is happening in this:
(gc "c:\myfile.txt") | --> Gets the content of c:\myfile.txt and sends it down the line
% --> Does a foreach loop (goes through each item in the pipeline individually)
-Begin {$test = (gc "c:\myfile.txt" | select -first 1 -last 1)} --> This is a begin block, it runs everything here before it goes onto the pipeline stuff. It is loading the first and last line of c:\myfile.txt into an array so we can check for first and last items
-Process {if ( $_ -eq $test[0] -or $_ -eq $test[-1] ) --> This runs a check on each item in the pipeline, checking if it's the first or the last item in the file
{ $_ -replace "-" } else { $_ } --> if it's the first or last, it does the replacement, if it's not, it just leaves it alone
| Set-Content "c:\myfile.txt" --> This puts the new values back into the file.
Please see the following sites for more information on each of these items:
Get-Content uses
Get-Content definition
Foreach
The Pipeline
Begin and Process part of the Foreach (this are usually for custom function, but they work in the foreach loop as well)
If ... else statements
Set-Content
So I was thinking about what if you wanted to do this to many files, or wanted to do this often. I decided to make a function that does what you are asking. Here is the function:
function Replace-FirstLast {
[CmdletBinding()]
param(
[Parameter( `
Position=0, `
Mandatory=$true)]
[String]$File,
[Parameter( `
Position=1, `
Mandatory=$true)]
[ValidateNotNull()]
[regex]$Regex,
[Parameter( `
position=2, `
Mandatory=$false)]
[string]$ReplaceWith=""
)
Begin {
$lines = Get-Content $File
} #end begin
Process {
foreach ($line in $lines) {
if ( $line -eq $lines[0] ) {
$lines[0] = $line -replace $Regex,$ReplaceWith
} #end if
if ( $line -eq $lines[-1] ) {
$lines[-1] = $line -replace $Regex,$ReplaceWith
}
} #end foreach
}#End process
end {
$lines | Set-Content $File
}#end end
} #end function
This will create a command called Replace-FirstLast. It would be called like this:
Replace-FirstLast -File "C:\myfiles.txt" -Regex "-" -ReplaceWith "NewText"
The -Replacewith is optional, if it is blank it will just remove (default value of ""). The -Regex is looking for a regular expression to match your command. For information on placing this into your profile check this article
Please note: If you file is very large (several GBs), this isn't the best solution. This would cause the whole file to live in memory, which could potentially cause other issues.
try:
$txt = get-content c:\myfile.txt
$txt[0] = $txt[0] -replace '-'
$txt[$txt.length - 1 ] = $txt[$txt.length - 1 ] -replace '-'
$txt | set-content c:\myfile.txt
You can use the select-object cmdlet to help you with this, since get-content basically spits out a text file as one huge array.
Thus, you can do something like this
get-content "path_to_my_awesome_file" | select -first 1 -last 1
To remove the dash after that, you can use the -Replace switch to find the dash and remove it. This is better than using System.String.Replace(...) method because it can match regex statements and replace whole arrays of strings too!
That would look like:
# gc = Get-Content. The parens tell Powershell to do whatever's inside of it
# then treat it like a variable.
(gc "path_to_my_awesome_file" | select -first 1 -last 1) -Replace '-',''
If your file is very large you might not want to read the whole file to get the last line. gc -Tail will get the last line very quickly for you.
function GetFirstAndLastLine($path){
return New-Object PSObject -Property #{
First = Get-Content $path -TotalCount 1
Last = Get-Content $path -Tail 1
}
}
GetFirstAndLastLine "u_ex150417.log"
I tried this on a 20 gb log file and it returned immediately. Reading the file takes hours.
You will still need to read the file if you want to keep all excising content and you want only to remove from the end. Using the -Tail is a quick way to check if it is there.
I hope it helps.
A cleaner answer to the above:
$Line_number_were_on = 0
$Awesome_file = Get-Content "path_to_ridiculously_excellent_file" | %{
$Line = $_
if ($Line_number_were_on -eq $Awesome_file.Length)
{ $Line -Replace '-','' }
else
{ $Line } ;
$Line_number_were_on++
}
I like one-liners, but I find that readability tends to suffer sometimes when I put terseness over function. If what you're doing is going to be part of a script that other people will be reading/maintaining, readability might be something to consider.
Following Nick's answer: I do need to do this on all text files in the directory tree and this is what I'm using now:
Get-ChildItem -Path "c:\work\test" -Filter *.i | where { !$_.PSIsContainer } | % {
$txt = Get-Content $_.FullName;
$txt[0] = $txt[0] -replace '-';
$txt[$txt.length - 1 ] = $txt[$txt.length - 1 ] -replace '-';
$txt | Set-Content $_.FullName
}
and it looks like it's working well now.
Simple process:
Replace $file.txt with your filename
Get-Content $file_txt | Select-Object -last 1
I was recently searching for comments in the last line of .bat files. It seems to mess up the error code of previous commands. I found this useful for searching for a pattern in the last line of files. Pspath is a hidden property that get-content outputs. If I used select-string, I would lose the filename. *.bat gets passed as -filter for speed.
get-childitem -recurse . *.bat | get-content -tail 1 | where { $_ -match 'rem' } |
select pspath
PSPath
------
Microsoft.PowerShell.Core\FileSystem::C:\users\js\foo\file.bat