How to delete rows in file under certain condition? - powershell

I've got the file 'Test.txt' which is being updated automatically. Every hour new value being added to this file. Like:
Some text 1:57
Some text 2:57
Some text 3:57
Some text 4:57
And I need to check when this file is more than 100 mb size, then delete FIRST half of the file. I mean 'Some text 1:57' and 'Some text 2:57' should be deleted in this case if the file has 4 values.
For now I have the next code where I can see the current size in KB.
$TestFileSize = (get-item C:\Test.txt).Length
if($TestFileSize -gt 100000){
# --- here the code that should delete the first 50 rows if file has 100 and so on.
}
Any advises? Thanks !

There are several ways of doing this. Here's two
$file = 'C:\Test.txt'
$TestFileSize = (Get-Item -Path $file).Length
if($TestFileSize -gt 100000){
# --- here the code that should delete the first 50 rows if file has 100 and so on.
$nlines = #([System.IO.File]::ReadAllLines($file)).Count
(Get-Content -Path $file -Tail ([math]::Ceiling($nlines / 2))) |
Set-Content -Path $file
}
or
$file = 'C:\Test.txt'
$TestFileSize = (Get-Item -Path $file).Length
if($TestFileSize -gt 100000){
$content = Get-Content -Path $file
$nlines = #($content).Count
$content[([math]::Ceiling($nlines / 2))..($nlines -1)] | Set-Content -Path $file
}

Related

Search, Increment, Replace and Save to New Files

I'm new to Powershell.
Let's say I have a file named Grocery_01.txt
And in this text file, there are several texts like;
Apple.1.Green
I want to change these texts to >> Apple.2.Green
And save it to a new txt file named Grocery_02.txt
Then, I want to repeat the process until I have a total of 99 files, with the last file named Grocery_99.txt which includes Apple.99.Green
Here is a sample for you. Assuming that your Grocery_01 file looks like
Apple.1.Green
Apple.2.Green
Banana.1.Red
Apple.10.Green
$apples = Get-Content C:\PS\Grocery_01.txt | Select-String -Pattern "Apple"
foreach ($line in $apples) {
$number = $line -replace '\D' #to get number from string
New-Item "C:\PS\Grocery_$number.txt" #create new file with the new number
Add-Content -Path "C:\PS\Grocery_$number.txt" -value $line
}
You may try this code and see if it suits your output.
Answer2:
In this case, I will use a CSV file for input and ForEach-Object to process each line.
Import-Csv C:\temp\grocerylist.csv | ForEach-Object {
for ($i = 1; $i -le $($_.Count); $i++) {
if (-not (Test-Path "C:\Temp\test\Grocery_$i.txt")) {
New-Item "C:\Temp\test\Grocery_$i.txt"
Add-Content -Path "C:\temp\test\Grocery_$i.txt" -Value "$($_.FruitName).$i.$($_.Colour)"
}
else {
Add-Content -Path "C:\temp\test\Grocery_$i.txt" -Value "$($_.FruitName).$i.$($_.Colour)"
}
}
}
And my CSV file will look like below(each heading in a different Column):
FruitName Count Colour
Apple 99 Green
Banana 25 Red
Note: Before you raise a question, please provide the details about what have you done so far. That will help you to get quick answers.

powershell: delete specific line from x to x

I'm new in powershell and I absolutely dont get it ...
Just want to delete line 7 to 2500 of a text file. First 6 lines should be untouched.
With linux bash everything is so easy, just:
sed -i '7,2500d' $file
Did not find any solution for mighty powershell :-(
Thank you.
Use Get-Content to read the contents of the file into a variable. The variable can be indexed like a regular PowerShell array. Get the parts of the array you need then pipe the variable into Set-Content to write back to the file.
$file = Get-Content test.log
$keep = $file[0..1] + $file[7..($file.Count - 1)]
$keep | Set-Content test.log
Using this as the contents of the file test.log:
One
Two
Three
Four
Five
Six
Seven
Eight
Nine
This script will output the following into test.log (overwriting the contents):
One
Two
Eight
Nine
In your case, you will want to use $file[0..5] + $file[2500..($file.Count - 1)].
To remove a series of lines in a text file, you could do something like this:
$fileIn = 'D:\Test\File1.txt'
$fileOut = 'D:\Test\File2.txt'
$startRemove = 7
$endRemove = 2500
$currentLine = 1
# needs .NET 4
$newText = foreach ($line in [System.IO.File]::ReadLines($fileIn)) {
if ($currentLine -lt $startRemove -or $currentLine -gt $endRemove) { $line}
$currentLine++
}
$newText | Set-Content -Path $fileOut -Force
Or, if your version of .NET is below 4.0
$reader = [System.IO.File]::OpenText($fileIn)
$newText = while($null -ne ($line = $reader.ReadLine())) {
if ($currentLine -lt $startRemove -or $currentLine -gt $endRemove) { $line }
$currentLine++
}
$reader.Dispose()
$newText | Set-Content -Path $fileOut -Force
Select-object -index takes an array, so:
1..10 > file
(get-content file) | select -index (0..5) | set-content file
get-content file
1
2
3
4
5
6
Or:
(cat file)[0..5] | set-content file

Split a large csv file into multiple csv files according to the size in powershell

I have a large CSV file and I want to split it with respect to size and the header should be in every file.
For example, I have this 1.6MB file and I want the child files shouldn't be more than 512KB. So practically the parent file should have 4 child file.
Tried with the below simple program but the file is splitting with blank child files.
function csvSplitter {
$csvFile = "D:\Test\PTest\Dummy.csv";
$split = 10;
$content = Import-Csv $csvFile;
$start = 1;
$end = 0;
$records_per_file = [int][Math]::Ceiling($content.Count / $split);
for($i = 1; $i -le $split; $i++) {
$end += $records_per_file;
$content | Where-Object {[int]$_.Id -ge $start -and [int]$_.Id -le $end} | Export-Csv -Path "D:\Test\PTest\Destination\file$i.csv" -NoTypeInformation;
$start = $end + 1;
}
}csvSplitter
The logic for the size of the file is yet to write.
Tried to add both the files but I guess there is no option to add files.
this takes a slightly different path to a solution. [grin]
it ...
loads the CSV as a plain text file
saves the 1st line as a header line
calcs the batch size from the total line count & the batch count
uses array index ranges to grab the lines for each batch
combines the header line with the current batch of lines
writes that out to a text file
the reason for such a roundabout method is to save RAM. one drawback to loading the file as a CSV is the sheer amount of RAM needed. just loading the lines of text requires noticeably less RAM.
$SourceDir = $env:TEMP
$InFileName = 'LargeFile.csv'
$InFullFileName = Join-Path -Path $SourceDir -ChildPath $InFileName
$BatchCount = 4
$DestDir = $env:TEMP
$OutFileName = 'LF_Batch_.csv'
$OutFullFileName = Join-Path -Path $DestDir -ChildPath $OutFileName
#region >>> build file to work with
# remove this region when you are ready to do this with your test data OR to do this with real data
if (-not (Test-Path -LiteralPath $InFullFileName))
{
Get-ChildItem -LiteralPath $env:APPDATA -Recurse -File |
Sort-Object -Property Name |
Select-Object Name, Length, LastWriteTime, Directory |
Export-Csv -LiteralPath $InFullFileName -NoTypeInformation
}
#endregion >>> build file to work with
$CsvAsText = Get-Content -LiteralPath $InFullFileName
[array]$HeaderLine = $CsvAsText[0]
$BatchSize = [int]($CsvAsText.Count / $BatchCount) + 1
$StartLine = 1
foreach ($B_Index in 1..$BatchCount)
{
if ($B_Index -ne 1)
{
$StartLine = $StartLine + $BatchSize + 1
}
$CurrentOutFullFileName = $OutFullFileName.Replace('_.', ('_{0}.' -f $B_Index))
$HeaderLine + $CsvAsText[$StartLine..($StartLine + $BatchSize)] |
Set-Content -LiteralPath $CurrentOutFullFileName
}
there is no output on screen, but i got 4 files named LF_Batch_1.csv thru LF_Batch_4.csv that contained the 4our parts of the source file as expected. the last file has a slightly smaller number of rows, but that is what happens when the row count is not evenly divisible by the batch count. [grin]
Try this:
Add-Type -AssemblyName System.Collections
function Split-Csv {
param (
[string]$filePath,
[int]$partsNum
)
# Use generic lists for import/export
[System.Collections.Generic.List[object]]$contentImport = #()
[System.Collections.Generic.List[object]]$contentExport = #()
# import csv-file
$contentImport = Import-Csv $filePath
# how many lines per export file
$linesPerFile = [Math]::Max( [int]($contentImport.Count / $partsNum), 1 )
# start pointer for source list
$startPointer = 0
# counter for file name
$counter = 1
# main loop
while( $startPointer -lt $contentImport.Count ) {
# clear export list
[void]$contentExport.Clear()
# determine from-to from source list to export
$endPointer = [Math]::Min( $startPointer + $linesPerFile, $contentImport.Count )
# move lines to export to export list
[void]$contentExport.AddRange( $contentImport.GetRange( $startPointer, $endPointer - $startPointer ) )
# export
$contentExport | Export-Csv -Path ($filePath.Replace('.', $counter.ToString() + '.' ) ) -NoTypeInformation -Force
# move pointer
$startPointer = $endPointer
# increase counter for filename
$counter++
}
}
Split-Csv -filePath 'test.csv' -partsNum 7
try running this script:
$sw = new-object System.Diagnostics.Stopwatch
$sw.Start()
$FilePath = $HOME +'\Documents\Projects\ADOPT\Data8277.csv'
$SplitDir = $HOME +'\Documents\Projects\ADOPT\Split\'
CSV-FileSplitter -Path $FilePath -PartSizeBytes 35MB -SplitDir $SplitDir #-Verbose
$sw.Stop()
Write-Host "Split complete in " $sw.Elapsed.TotalSeconds "seconds"
I created this for files larger than 50GB files

New columns into CSV file incredibly slow

I have a bunch of .csv files and I'm trying to add in some new column headers and their values (which are all blank anyway) and then output this to a new .csv file. My script currently runs and works fine but it takes about 5 minutes to complete the operation on a 60MB file with about 70,000 rows - I have about 100 files to do this on so it will take a while using this script.
My code is below, it's quite simple but clearly inefficient!
Import-Csv $strFilePath |
Select-Object *, #{Name='NewHeader';Expression={''}},
#{Name='NewHeader2';Expression={''}},
#{Name='NewHeader3';Expression={''}},
#{Name='NewHeader4';Expression={''}} |
Export-Csv $($strFilePath + ".new") -NoTypeInformation
As pointed out in the comments, I think it would be better to treat it as a simple text without the useless conversion.
$path = 'C:\test'
$newHeaders = 'NewHeader1','NewHeader2','NewHeader3','NewHeader4'
$files = Get-ChildItem -LiteralPath $path -Filter *.csv
$newHeadersString = #(''; $newHeaders | foreach { '"{0}"' -f $_ }) -join ','
$newColmunsString = ',""' * $newHeaders.Count
foreach ($file in $files) {
$sr = $file.OpenText()
$outfile = New-Item ($file.FullName + '.new') -Force
$sw = [IO.StreamWriter]::new($outfile.FullName)
$sw.WriteLine($sr.ReadLine() + $newHeadersString)
while(!$sr.EndOfStream) { $sw.WriteLine($sr.ReadLine() + $newColmunsString) }
$sr.Close()
$sw.Close()
}

Powershell .csv merge with column remove

Using the code below I am able to merge several .csv files in 5 seconds.
$getFirstLine = $true
get-childItem "C:\my\dir\*.csv" | foreach {
$filePath = $_
$lines = $lines = Get-Content $filePath
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "C:\my\dir\output_code2.csv" $linesToWrite
}
I would like to take this one step further, preferable using piping to remove several of the columns using a command like:
select DateAndTime,DG1_KW,DG2_KW,WT_KW,HTR1_KW,POSS_Load_KW,INV1_KW,INV2_SOC|Export-csv output_test.csv -Notypeinformation
that being the variables in the header of each file.
How would I modify this code to make this work? The idea here is that I am going to be working with hundreds up to thousands of files.
I have other code which can do this but it is no where near as fast.
for instance using 10 .csv files that are 450kb each. the code below takes 20 seconds to process and spits out a .csv file in 20 seconds removing 48 of the 56 columns leaving the variables I need. If I remove part of the code that trims the columns it still takes 12+ seconds.
# Directory containing csv files, include *.*
$directory = "C:\my\dir\*.*";
# Get the csv files
$csvFiles = Get-ChildItem -Path $directory -Filter *.csv;
#$content = $null;
$content = #();
# Process each file
foreach($csv in $csvFiles)
{
$content += Import-Csv $csv;
}
# Write a datetime stamped csv file
$datetime = Get-Date -Format "yyyyMMddhhmmss";
$content |Export-Csv -Path "C:\my\dir\output_code2_$datetime.csv" -NoTypeInformation;
The code I would like to modify runs those same 10 files in 5 seconds but does not remove the 48 columns.
Any Ideas guys?
Ok, you want an example... Let's say your CSVs always look like this:
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10
data1,data2,data3,data4,data5,data6,data7,data8,data9,data10
dataA,dataB,dataC,dataD,dataE,dataF,dataG,dataH,dataI,dataJ
Now let's say you only want Col1, Col2, Col6, Col9, and Col10. You could do a RegEx replace something like:
$Files = get-childItem "C:\my\dir\*.csv" | Select -Expand FullName
ForEach($File in $Files){
If($SkipFirst){
Get-Content $File | Select -Skip 1 | ForEach{$_ -replace "^((?:.*?\,){2})(?:.*\,){3}(.*?\,)(?:(?:.*?\,){2})(.*?,.*?)$", '$1$2$3'} | Add-Content "C:\my\dir\output_code2.csv"
}Else{
Get-Content $File | ForEach{$_ -replace "^((?:.*?\,){2})(?:.*\,){3}(.*?\,)(?:(?:.*?\,){2})(.*?,.*?)$", '$1$2$3'} | Add-Content "C:\my\dir\output_code2.csv"
}
}
That would extract just the columns that I noted above. See https://regex101.com/r/jY4oO6/1 for detailed breakdown of RegEx string. Effective output would be (skipping first line if so dictated):
Col1,Col2,Col6,Col9,Col10
data1,data2,data6,data9,data10
dataA,dataB,dataF,dataI,dataJ