Reducing amout of lines in variable within loop in Powershell

Reducing amout of lines in variable within loop in Powershell - powershell

I have a txt file containing 10000 lines. Each line is an ID.
Within every loop iteration I want to select 100 lines, put them in a special format and do something. I want to do this until the document is finished.
The txt looks like this:
406232C1331283
4062321N022075
4062321H316457
Current approach:
$liste = get-content "C:\x\input.txt"
foreach ($item in $liste) {
azcopy copy $source $target --include-pattern "*$item*" --recursive=true
}
The system will go throug the TXT file and make a copy request for every name it finds in the TXT file. Now the system is able to handle like 300 search-patterns in one request. like
azcopy copy $source $target --include-pattern "*id1*;*id2*;*id3*"
How can I extract 300 items from the document at once, separate them with semicolon and embedd them in wildcard? I tried to pipe everyting in a variable and work with -skip.
But it seems not easy to handle :(

Use the -ReadCount parameter to Get-Content to send multiple lines down the pipeline:
Get-Content "C:\x\input.txt" -ReadCount 300 | ForEach-Object {
$wildCards = ($_ | ForEach-Object { "*$_*" } -join ';'
azcopy copy $source $target --include-pattern $wildCards --recursive=true
}

Do you want 100 or 300 at a time? ;-)
I'm not sure if I really got what the endgoal is but to slice a given amount of elements in chunks of a certain size you can use a for loop like this:
$liste = Get-Content -Path 'C:\x\input.txt'
for ($i = 0; $i -lt $Liste.Count; $i += 100) {
$Liste[$i..$($i + 99)]
}
Now if I got it right you want to join these 100 elements and surround them with certain cahrachters ... this might work:
'"*' + ($Liste[$i..$($i + 99)] -join '*;*') + '*"'
Together it would be this:
$liste = Get-Content -Path 'C:\x\input.txt'
for ($i = 0; $i -lt $Liste.Count; $i += 100) {
'"*' + ($Liste[$i..$($i + 99)] -join '*;*') + '*"'
}

There's many ways, here's one of them...
First I would split array to chunks of 100 elements each, using this helper function:
Function Split-Array ($list, $count) {
$aggregateList = #()
$blocks = [Math]::Floor($list.Count / $count)
$leftOver = $list.Count % $count
for($i=0; $i -lt $blocks; $i++) {
$end = $count * ($i + 1) - 1
$aggregateList += #(,$list[$start..$end])
$start = $end + 1
}
if($leftOver -gt 0) {
$aggregateList += #(,$list[$start..($end+$leftOver)])
}
$aggregateList
}
For example to split your list into chunks of 100 do this:
$Splitted = Split-Array $liste -count 100
Then use foreach to iterate each chunk and join its elements for the pattern you need:
foreach ($chunk in $Splitted)
{
$Pattern = '"' + (($chunk | % {"*$_*"}) -join ";") + '"'
azcopy copy $source $target --include-pattern $Pattern --recursive=true
}

Related

How to sort 30Million csv records in Powershell

I am using oledbconnection to sort the first column of csv file. Oledb connection is executed up to 9 million records within 6 min duration successfully. But when am executing 10 million records, getting following alert message.
Exception calling "ExecuteReader" with "0" argument(s): "The query cannot be completed. Either the size of the query result is larger than the maximum size of a database (2 GB), or
there is not enough temporary storage space on the disk to store the query result."
is there any other solution to sort 30 million using Powershell?
here is my script
$OutputFile = "D:\Performance_test_data\output1.csv"
$stream = [System.IO.StreamWriter]::new( $OutputFile )
$sb = [System.Text.StringBuilder]::new()
$sw = [Diagnostics.Stopwatch]::StartNew()
$conn = New-Object System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source='D:\Performance_test_data\';Extended Properties='Text;HDR=Yes;CharacterSet=65001;FMT=Delimited';")
$cmd=$conn.CreateCommand()
$cmd.CommandText="Select * from 1crores.csv order by col6"
$conn.open()
$data = $cmd.ExecuteReader()
echo "Query has been completed!"
$stream.WriteLine( "col1,col2,col3,col4,col5,col6")
while ($data.read())
{
$stream.WriteLine( $data.GetValue(0) +',' + $data.GetValue(1)+',' + $data.GetValue(2)+',' + $data.GetValue(3)+',' + $data.GetValue(4)+',' + $data.GetValue(5))
}
echo "data written successfully!!!"
$stream.close()
$sw.Stop()
$sw.Elapsed
$cmd.Dispose()
$conn.Dispose()

You can try using this:
$CSVPath = 'C:\test\CSVTest.csv'
$Delimiter = ';'
# list we use to hold the results
$ResultList = [System.Collections.Generic.List[Object]]::new()
# Create a stream (I use OpenText because it returns a streamreader)
$File = [System.IO.File]::OpenText($CSVPath)
# Read and parse the header
$HeaderString = $File.ReadLine()
# Get the properties from the string, replace quotes
$Properties = $HeaderString.Split($Delimiter).Replace('"',$null)
$PropertyCount = $Properties.Count
# now read the rest of the data, parse it, build an object and add it to a list
while ($File.EndOfStream -ne $true)
{
# Read the line
$Line = $File.ReadLine()
# split the fields and replace the quotes
$LineData = $Line.Split($Delimiter).Replace('"',$null)
# Create a hashtable with the properties (we convert this to a PSCustomObject later on). I use an ordered hashtable to keep the order
$PropHash = [System.Collections.Specialized.OrderedDictionary]#{}
# if loop to add the properties and values
for ($i = 0; $i -lt $PropertyCount; $i++)
{
$PropHash.Add($Properties[$i],$LineData[$i])
}
# Now convert the data to a PSCustomObject and add it to the list
$ResultList.Add($([PSCustomObject]$PropHash))
}
# Now you can sort this list using Linq:
Add-Type -AssemblyName System.Linq
# Sort using propertyname (my sample data had a prop called "Name")
$Sorted = [Linq.Enumerable]::OrderBy($ResultList, [Func[object,string]] { $args[0].Name })
Instead of using import-csv I've written a quick parser which uses a streamreader and parses the CSV data on the fly and puts it in a PSCustomObject.
This is then added to a list.
edit: fixed the linq sample

Putting the performance aside and at least come to a solution that works (meaning one that doesn't hang due to memory shortage) I would rely on the PowerShell pipeline. The issue is thou that for sorting an object you will need to stall te pipeline as the last object might potentially become the first object.
To resolve this part, I would do a coarse division on the first character(s) of the concern property first. Once that is done, fine sort each coarse division and append the results:
Function Sort-BigObject {
[CmdletBinding()] param(
[Parameter(ValueFromPipeLine = $True)]$InputObject,
[Parameter(Position = 0)][String]$Property,
[ValidateRange(1,9)]$Coarse = 1,
[System.Text.Encoding]$Encoding = [System.Text.Encoding]::Default
)
Begin {
$TemporaryFiles = [System.Collections.SortedList]::new()
}
Process {
if ($InputObject.$Property) {
$Grain = $InputObject.$Property.SubString(0, $Coarse)
if (!$TemporaryFiles.Contains($Grain)) { $TemporaryFiles[$Grain] = New-TemporaryFile }
$InputObject | Export-Csv $TemporaryFiles[$Grain] -Encoding $Encoding -Append
} else { $InputObject.$Property }
}
End {
Foreach ($TemporaryFile in $TemporaryFiles.Values) {
Import-Csv $TemporaryFile -Encoding $Encoding | Sort-Object $Property
Remove-Item -LiteralPath $TemporaryFile
}
}
}
Usage
(Don't assign the stream to a variable and don't use parenthesis.)
Import-Csv .\1crores.csv | Sort-BigObject <PropertyName> | Export-Csv .\output.csv
If the temporary files still get too big to handle, you might need to increase the -Coarse parameter
Caveats (improvement considerations)
Objects with an empty sort property will be immediately outputted
The sort column is presumed to be a (single) string column
I presume the performance is poor (I didn't do a full test on 30 million records, but 10.000 records take about 8 second which means about 8 hours). Consider replacing native PowerShell cmdlets with .Net streaming methods. buffer/cache file input and outputs, parallel processing?

You could try SQLite:
$OutputFile = "D:\Performance_test_data\output1.csv"
$sw = [Diagnostics.Stopwatch]::StartNew()
sqlite3 output1.db '.mode csv' '.import 1crores.csv 1crores' '.headers on' ".output $OutputFile" 'Select * from 1crores order by 最終アクセス日時'
echo "data written successfully!!!"
$sw.Stop()
$sw.Elapsed

I have added a new answer as this is a complete different approach to tackle this issue.
Instead of creating temporary files (which presumable causes a lot of file opens and closures), you might consider to create a ordered list of indices and than go over the input file (-FilePath) multiple times and each time, process a selective number of lines (-BufferSize = 1Gb, you might have to tweak this "memory usage vs. performance" parameter):
Function Sort-Csv {
[CmdletBinding()] param(
[string]$InputFile,
[String]$Property,
[string]$OutputFile,
[Char]$Delimiter = ',',
[System.Text.Encoding]$Encoding = [System.Text.Encoding]::Default,
[Int]$BufferSize = 1Gb
)
Begin {
if ($InputFile.StartsWith('.\')) { $InputFile = Join-Path (Get-Location) $InputFile }
$Index = 0
$Dictionary = [System.Collections.Generic.SortedDictionary[string, [Collections.Generic.List[Int]]]]::new()
Import-Csv $InputFile -Delimiter $Delimiter -Encoding $Encoding | Foreach-Object {
if (!$Dictionary.ContainsKey($_.$Property)) { $Dictionary[$_.$Property] = [Collections.Generic.List[Int]]::new() }
$Dictionary[$_.$Property].Add($Index++)
}
$Indices = [int[]]($Dictionary.Values | ForEach-Object { $_ })
$Dictionary = $Null # we only need the sorted index list
}
Process {
$Start = 0
$ChunkSize = [int]($BufferSize / (Get-Item $InputFile).Length * $Indices.Count / 2.2)
While ($Start -lt $Indices.Count) {
[System.GC]::Collect()
$End = $Start + $ChunkSize - 1
if ($End -ge $Indices.Count) { $End = $Indices.Count - 1 }
$Chunk = #{}
For ($i = $Start; $i -le $End; $i++) { $Chunk[$Indices[$i]] = $i }
$Reader = [System.IO.StreamReader]::new($InputFile, $Encoding)
$Header = $Reader.ReadLine()
$i = $Start
$Count = 0
For ($i = 0; ($Line = $Reader.ReadLine()) -and $Count -lt $ChunkSize; $i++) {
if ($Chunk.Contains($i)) { $Chunk[$i] = $Line }
}
$Reader.Dispose()
if ($OutputFile) {
if ($OutputFile.StartsWith('.\')) { $OutputFile = Join-Path (Get-Location) $OutputFile }
$Writer = [System.IO.StreamWriter]::new($OutputFile, ($Start -ne 0), $Encoding)
if ($Start -eq 0) { $Writer.WriteLine($Header) }
For ($i = $Start; $i -le $End; $i++) { $Writer.WriteLine($Chunk[$Indices[$i]]) }
$Writer.Dispose()
} else {
$Start..$End | ForEach-Object { $Header } { $Chunk[$Indices[$_]] } | ConvertFrom-Csv -Delimiter $Delimiter
}
$Chunk = $Null
$Start = $End + 1
}
}
}
Basic usage
Sort-Csv .\Input.csv <PropertyName> -Output .\Output.csv
Sort-Csv .\Input.csv <PropertyName> | ... | Export-Csv .\Output.csv
Note that for 1Crones.csv it will probably just export the full file in once unless you set the -BufferSize to a lower amount e.g. 500Kb.

I downloaded gnu sort.exe from here: http://gnuwin32.sourceforge.net/packages/coreutils.htm It also requires libiconv2.dll and libintl3.dll from the dependency zip. I basically did this within cmd.exe, and it used a little less than a gig of ram and took about 5 minutes. It's a 500 meg file of about 30 million random numbers. This command can also merge sorted files with --merge. You can also specify begin and end key position for sorting --key. It automatically uses temp files.
.\sort.exe < file1.csv > file2.csv
Actually it works in a similar way with the windows sort from the cmd prompt. The windows sort also has a /+n option to specify what character column to start the sort by.
sort.exe < file1.csv > file2.csv

Powershell Open File, Edit File, Update File - document lock issue

I have a text file with a list of multiple files that exceeded x characters. What I am trying to do is open each file, scan each line of the file, and if a file is more than x characters long I move the line to the next line so the file does not exceed x characters. That piece works great. The problem I am having is updating the text file I am trying to change/edit. I suspect the lock is the powershell script since the script is reading the file. Does anyone have any ideas on what I can do to update the original text file or remove the lock? Thanks for any help! My code is below:
[int] $limit = 131
$path = get-content C:\document\fix.txt
foreach ($f in $path)
{
Get-Content -path $f |
ForEach-Object {
$line = $_
for ($i = 0; $i -lt $line.Length; $i += $limit)
{
$length = [Math]::Min($limit, $line.Length - $i)
$line.SubString($i, $length)
}
} |
Set-Content $f
}

I figured it out! I had to put ( ) around get-content. Basically means - finish what you are doing before going to the next step.
[int] $limit = 131
$path = get-content C:\Soarian\fix.txt
foreach ($f in $path)
{
(Get-Content -path $f) |
ForEach-Object {
$line = $_
for ($i = 0; $i -lt $line.Length; $i += $limit)
{
$length = [Math]::Min($limit, $line.Length - $i)
$line.SubString($i, $length)
}
} |
Set-Content $f -Force
}

Changing the Delimiter in a large CSV file using Powershell

I am in need of a way to change the delimiter in a CSV file from a comma to a pipe. Because of the size of the CSV files (~750 Mb to several Gb), using Import-CSV and/or Get-Content is not an option. What I'm using (and what works, albeit slowly) is the following code:
$reader = New-Object Microsoft.VisualBasic.FileIO.TextFieldParser $source
$reader.SetDelimiters(",")
While(!$reader.EndOfData)
{
$line = $reader.ReadFields()
$details = [ordered]#{
"Plugin ID" = $line[0]
CVE = $line[1]
CVSS = $line[2]
Risk = $line[3]
}
$export = New-Object PSObject -Property $details
$export | Export-Csv -Append -Delimiter "|" -Force -NoTypeInformation -Path "C:\MyFolder\Delimiter Change.csv"
}
This little loop took nearly 2 minutes to process a 20 Mb file. Scaling up at this speed would mean over an hour for the smallest CSV file I'm currently working with.
I've tried this as well:
While(!$reader.EndOfData)
{
$line = $reader.ReadFields()
$details = [ordered]#{
# Same data as before
}
$export.Add($details) | Out-Null
}
$export | Export-Csv -Append -Delimiter "|" -Force -NoTypeInformation -Path "C:\MyFolder\Delimiter Change.csv"
This is MUCH FASTER but doesn't provide the right information in the new CSV. Instead I get rows and rows of this:
"Count"|"IsReadOnly"|"Keys"|"Values"|"IsFixedSize"|"SyncRoot"|"IsSynchronized"
"13"|"False"|"System.Collections.Specialized.OrderedDictionary+OrderedDictionaryKeyValueCollection"|"System.Collections.Specialized.OrderedDictionary+OrderedDictionaryKeyValueCollection"|"False"|"System.Object"|"False"
"13"|"False"|"System.Collections.Specialized.OrderedDictionary+OrderedDictionaryKeyValueCollection"|"System.Collections.Specialized.OrderedDictionary+OrderedDictionaryKeyValueCollection"|"False"|"System.Object"|"False"
So, two questions:
1) Can the first block of code be made faster?
2) How can I unwrap the arraylist in the second example to get to the actual data?
EDIT: Sample data found here - http://pastebin.com/6L98jGNg

This is simple text-processing, so the bottleneck should be disk read speed:
1 second per 100 MB or 10 seconds per 1GB for the OP's sample (repeated to the mentioned size) as measured here on i7. The results would be worse for files with many/all small quoted fields.
The algo is simple:
Read the file in big string chunks e.g. 1MB.
It's much faster than reading millions of lines separated by CR/LF because:
less checks are performed as we mostly/primarily look only for doublequotes;
less iterations of our code executed by the interpreter which is slow.
Find the next doublequote.
Depending on the current $inQuotedField flag decide whether the found doublequote starts a quoted field (should be preceded by , + some spaces optionally) or ends the current quoted field (should be followed by any even number of doublequotes, optionally spaces, then ,).
Replace delimiters in the preceding span or to the end of 1MB chunk if no quotes were found.
The code makes some reasonable assumptions but it may fail to detect an escaped field if its doublequote is followed or preceded by more than 3 spaces before/after field delimiter. The checks won't be too hard to add, and I might've missed some other edge case, but I'm not that interested.
$sourcePath = 'c:\path\file.csv'
$targetPath = 'd:\path\file2.csv'
$targetEncoding = [Text.UTF8Encoding]::new($false) # no BOM
$delim = [char]','
$newDelim = [char]'|'
$buf = [char[]]::new(1MB)
$sourceBase = [IO.FileStream]::new(
$sourcePath,
[IO.FileMode]::open,
[IO.FileAccess]::read,
[IO.FileShare]::read,
$buf.length, # let OS prefetch the next chunk in background
[IO.FileOptions]::SequentialScan)
$source = [IO.StreamReader]::new($sourceBase, $true) # autodetect encoding
$target = [IO.StreamWriter]::new($targetPath, $false, $targetEncoding, $buf.length)
$bufStart = 0
$bufPadding = 4
$inQuotedField = $false
$fieldBreak = [char[]]#($delim, "`r", "`n")
$out = [Text.StringBuilder]::new($buf.length)
while ($nRead = $source.Read($buf, $bufStart, $buf.length-$bufStart)) {
$s = [string]::new($buf, 0, $nRead+$bufStart)
$len = $s.length
$pos = 0
$out.Clear() >$null
do {
$iQuote = $s.IndexOf([char]'"', $pos)
if ($inQuotedField) {
$iDelim = if ($iQuote -ge 0) { $s.IndexOf($delim, $iQuote+1) }
if ($iDelim -eq -1 -or $iQuote -le 0 -or $iQuote -ge $len - $bufPadding) {
# no closing quote in buffer safezone
$out.Append($s.Substring($pos, $len-$bufPadding-$pos)) >$null
break
}
if ($s.Substring($iQuote, $iDelim-$iQuote+1) -match "^(""+)\s*$delim`$") {
# even number of quotes are just quoted quotes
$inQuotedField = $matches[1].length % 2 -eq 0
}
$out.Append($s.Substring($pos, $iDelim-$pos+1)) >$null
$pos = $iDelim + 1
continue
}
if ($iQuote -ge 0) {
$iDelim = $s.LastIndexOfAny($fieldBreak, $iQuote)
if (!$s.Substring($iDelim+1, $iQuote-$iDelim-1).Trim()) {
$inQuotedField = $true
}
$replaced = $s.Substring($pos, $iQuote-$pos+1).Replace($delim, $newDelim)
} elseif ($pos -gt 0) {
$replaced = $s.Substring($pos).Replace($delim, $newDelim)
} else {
$replaced = $s.Replace($delim, $newDelim)
}
$out.Append($replaced) >$null
$pos = $iQuote + 1
} while ($iQuote -ge 0)
$target.Write($out)
$bufStart = 0
for ($i = $out.length; $i -lt $s.length; $i++) {
$buf[$bufStart++] = $buf[$i]
}
}
if ($bufStart) { $target.Write($buf, 0, $bufStart) }
$source.Close()
$target.Close()

Still not what I would call fast, but this is considerably faster than what you have listed by using the -Join operator:
$reader = New-Object Microsoft.VisualBasic.fileio.textfieldparser $source
$reader.SetDelimiters(",")
While(!$reader.EndOfData){
$line = $reader.ReadFields()
$line -join '|' | Add-Content C:\Temp\TestOutput.csv
}
That took a hair under 32 seconds to process a 20MB file. At that rate your 750MB file would be done in under 20 minutes, and bigger files should go at about 26 minutes per gig.

Split CSV with powershell

I have large CSV files (50-500 MB each). Running complicated power shell commands on these takes forever and/or hits memory issues.
Processing the data requires grouping by common fields, say in ColumnA. So assuming that the data is already sorted by that column, if I split these files randomly (i.e. each x-thousand lines) then matching entries could still end up in different parts. There are thousands of different groups in A, so splitting every one into a single file would create to many files.
How can I split it into files of 10,000-ish lines and not lose the groups? E.g. rows 1-13 would be A1 in Column A, rows 14-17 would be A2 etc. and row 9997-10012 would be A784. In this case i would want the first file to contain rows 1-10012 and the next one to start with row 10013.
Obviously I would want to keep the entire rows (rather than just Column A), so if I pasted all the resulting files together this would be the same as the original file.

Not tested. This assumes ColumnA is the first column and it's common comma-delimited data. You'll need to adjust the line that creates the regex to suit your data.
$count = 0
$header = get-content file.csv -TotalCount 1
get-content file.csv -ReadCount 1000 |
foreach {
#add tail entries from last batch to beginning of this batch
$newbatch = $tail + $_
#create regex to match last entry in this batch
$regex = '^' + [regex]::Escape(($newbatch[-1].split(',')[0]))
#Extract everything that doesn't match the last entry to new file
#Add header if this is not the first file
if ($count)
{
$header |
set-content "c:\somedir\filepart_$count"
}
$newbatch -notmatch $regex |
add-content "c:\somedir\filepart_$count"
#Extact tail entries to add to next batch
$tail = #($newbatch -match $regex)
#Increment file counter
$count++
}

This is my attempt, it got messy :-P It will load the whole file into memory while splitting it, but this is pure text. It should take less memory then imported objects, but still about the size of the file.
$filepath = "C:\Users\graimer\Desktop\file.csv"
$file = Get-Item $filepath
$content = Get-Content $file
$csvheader = $content[0]
$lines = $content.Count
$minlines = 10000
$filepart = 1
$start = 1
while ($start -lt $lines - 1) {
#Set minimum $end value (last line)
if ($start + $minlines -le $lines - 1) { $end = $start + $minlines - 1 } else { $end = $lines - 1 }
#Value to compare. ColA is first column in my file = [0] . ColB is second column = [1]
$avalue = $content[$end].split(",")[0]
#If not last line in script
if ($end -ne $lines -1) {
#Increase $end by 1 while ColA is the same
while ($content[$end].split(",")[0] -eq $avalue) { $end++ }
#Return to last line with equal ColA value
$end--
}
#Create new csv-part
$filename = $file.FullName.Replace($file.BaseName, ($file.BaseName + ".part$filepart"))
#($csvheader, $content[$start..$end]) | Set-Content $filename
#Fix counters
$filepart++
$start = $end + 1
}
file.csv:
ColA,ColB,ColC
A1,1,10
A1,2,20
A1,3,30
A2,1,10
A2,2,20
A3,1,10
A4,1,10
A4,2,20
A4,3,30
A4,4,40
A4,5,50
A4,6,60
A5,1,10
A6,1,10
A7,1,10
Results (I used $minlines = 5):
file.part1.csv:
ColA,ColB,ColC
A1,1,10
A1,2,20
A1,3,30
A2,1,10
A2,2,20
file.part2.csv:
ColA,ColB,ColC
A3,1,10
A4,1,10
A4,2,20
A4,3,30
A4,4,40
A4,5,50
A4,6,60
file.part3.csv:
ColA,ColB,ColC
A5,1,10
A6,1,10
A7,1,10

This requires PowerShell v3 (due to -append on Export-CSV).
Also, I'm assuming that you have column headers and the first column is named col1. Adjust as necessary.
import-csv MYFILE.csv|foreach-object{$_|export-csv -notypeinfo -noclobber -append ($_.col1 + ".csv")}
This will create one file for each distinct value in the first column, with that value as the file name.

To compliment the helpful answer from mjolinor with a reusable function with a few additional parameters and using the steppable pipeline which is about a factor 8 faster:
function Split-Content {
[CmdletBinding()]
param (
[Parameter(Mandatory=$true)][String]$Path,
[ULong]$HeadSize,
[ValidateRange(1, [ULong]::MaxValue)][ULong]$DataSize = [ULong]::MaxValue,
[Parameter(Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]$Value
)
begin {
$Header = [Collections.Generic.List[String]]::new()
$DataCount = 0
$PartNr = 1
}
Process {
$ReadCount = 0
while ($ReadCount -lt #($_).Count -and $Header.Count -lt $HeadSize) {
if (#($_)[$ReadCount]) { $Header.Add(#($_)[$ReadCount]) }
$ReadCount++
}
if ($ReadCount -lt #($_).Count -and $Header.Count -ge $HeadSize) {
do {
if ($DataCount -le 0) { # Should never be less
$FileInfo = [System.IO.FileInfo]$ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($Path)
$FileName = $FileInfo.BaseName + $PartNr++ + $FileInfo.Extension
$LiteralPath = [System.IO.Path]::Combine($FileInfo.DirectoryName, $FileName)
$steppablePipeline = { Set-Content -LiteralPath $LiteralPath }.GetSteppablePipeline()
$steppablePipeline.Begin($PSCmdlet)
$steppablePipeline.Process($Header)
}
$Next = [math]::min(($DataSize - $DataCount), #($_).Count)
if ($Next -gt $ReadCount) { $steppablePipeline.Process(#($_)[$ReadCount..($Next - 1)]) }
$DataCount = ($DataCount + $Next - $ReadCount) % $DataSize
if ($DataCount -le 0) { $steppablePipeline.End() }
$ReadCount = $Next % #($_).Count
} while ($ReadCount)
}
}
End {
if ($steppablePipeline) { $steppablePipeline.End() }
}
}
Parameters
Value
Specifies the listed content lines to be broken into parts. Multiple lines sent through the pipeline at a time (aka sub arrays like Object[]) will also be passed to the output file at a time (assuming that is fits the -DataSize).
Path
Specifies a path to one or more locations. Each filename in the location is suffixed with a part number (starting with 1).
HeadSize
The specifies the number of lines of the header that will be taken from the input and preceded in each file part. The default is 0, meaning no header line are copied.
DataSize
The specifies the number of lines that will be successively taken (after the header) from the input as data and pasted into each file part. The default is [ULong]::MaxValue, basically meaning that all data is copied to a single file.
Example 1:
Get-Content -ReadCount 1000 .\Test.Csv |Split-Content -Path .\Part.Csv -HeadSize 1 -DataSize 10000
This will split the .\Test.Csv file in chuncks of csv files with 10000 rows
Note that the performance of this Split-Content function highly depends on the -ReadCount of the prior Get-Content cmdlet.
Example 2:
Get-Process |Out-String -Stream |Split-Content -Path .\Process.Txt -HeadSize 2 -DataSize 20
This will write chunks of 20 processes to the .\Process<PartNr>.Txt files preceded with the standard (2 line) header format:
NPM(K) PM(M) WS(M) CPU(s) Id SI ProcessName
------ ----- ----- ------ -- -- -----------
... # 20 rows following

Grab the first 10 characters of each line in a file, from Left to Right, in Powershell

I'd like to take the first x amount of characters from $line and move them into a variable.
How do I do this?
Here is my existing code
$data = get-content "C:\TestFile.txt"
foreach($line in $data)
{
if($line.length -gt 250){**get first x amount of characters into variable Y**}
}

like this:
$data = get-content "TestFile.txt"
$amount = 10;
$y= #()
foreach($line in $data)
{
if ( $line.length -gt 250){ $y += $line.substring(0,$amount) }
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Reducing amout of lines in variable within loop in Powershell - powershell

Use the -ReadCount parameter to Get-Content to send multiple lines down the pipeline: Get-Content "C:\x\input.txt" -ReadCount 300 | ForEach-Object { $wildCards = ($_ | ForEach-Object { "$_" } -join ';' azcopy copy $source $target --include-pattern $wildCards --recursive=true }

Related

How to sort 30Million csv records in Powershell

Powershell Open File, Edit File, Update File - document lock issue

Changing the Delimiter in a large CSV file using Powershell

Split CSV with powershell

Grab the first 10 characters of each line in a file, from Left to Right, in Powershell

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Reducing amout of lines in variable within loop in Powershell - powershell

Use the -ReadCount parameter to Get-Content to send multiple lines down the pipeline: Get-Content "C:\x\input.txt" -ReadCount 300 | ForEach-Object { $wildCards = ($_ | ForEach-Object { "*$_*" } -join ';' azcopy copy $source $target --include-pattern $wildCards --recursive=true }

Related

How to sort 30Million csv records in Powershell

Powershell Open File, Edit File, Update File - document lock issue

Changing the Delimiter in a large CSV file using Powershell

Split CSV with powershell

Grab the first 10 characters of each line in a file, from Left to Right, in Powershell

Categories

Resources

Use the -ReadCount parameter to Get-Content to send multiple lines down the pipeline: Get-Content "C:\x\input.txt" -ReadCount 300 | ForEach-Object { $wildCards = ($_ | ForEach-Object { "$_" } -join ';' azcopy copy $source $target --include-pattern $wildCards --recursive=true }