PowerShell 7.0 how to compute hashsum of a big file read in chunks

The script should copy files and compute hash sum of them.
My goal is make the function which will read the file once instead of 3 ( read_for_copy + read_for_hash + read_for_another_copy ) to minimize network load.
So I tried read a chunk of file then compute md5 hash sum and write out file to several places.
The file`s size may vary from 100 MB up to 2 TB and maybe more. There is no need to check files identity at this moment, just need to compute hash sum for initial files.
And I am stuck with respect to computing hash sum:
$ifile = "C:\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
$md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$bufferSize = 10mb
$stream = [System.IO.File]::OpenRead($ifile)
$makenew = [System.IO.File]::OpenWrite($ofile)
$makenew2 = [System.IO.File]::OpenWrite($ofile2)
$buffer = new-object Byte[] $bufferSize
while ( $stream.Position -lt $stream.Length ) {
$bytesRead = $stream.Read($buffer, 0, $bufferSize)
$makenew.Write($buffer, 0, $bytesread)
$makenew2.Write($buffer, 0, $bytesread)
# I am stuck here
$hash = [System.BitConverter]::ToString($md5.ComputeHash($buffer)) -replace "-",""
How I can collect chunks of data to compute the hash of whole file?
And extra question: is it possible to calculate hash and write data out in parallel mode? Especially taking into account that workflow {parallel{}} does not supported from PS version 6 ?
Many thanks

If you want to handle input buffering manually, you need to use the TransformBlock/TransformFinalBlock methods exposed by $md5:
while($bytesRead = $stream.Read($buffer, 0, $bufferSize))
# Write to file copies
$makenew.Write($buffer, 0, $bytesread)
$makenew2.Write($buffer, 0, $bytesread)
# Feed next chunk to MD5 CSP
$null = $md5.TransformBlock($buffer, 0 , $bytesRead, $null, 0)
# Complete the hashing routine
$md5.TransformFinalBlock([byte[]]::new(0), 0, 0)
# Grab hash value from CSP
$hash = [BitConverter]::ToString($md5.Hash).Replace('-','')
My goal is make the function which will read the file once instead of 3 ( read_for_copy + read_for_hash + read_for_another_copy ) to minimize network load
I'm not entirely sure what you mean by network load here. If the source file is on a remote file share, but the new copies go onto a local file system, you can minimize network load by simply copying the source file once, then use that one copy as the source of the second copy and the hash calculation:
$ifile = "\\remoteMachine\c$\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
# Copy remote -> local
Copy-Item -Path $ifile -Destination $ofile
# Copy local -> local
Copy-Item -Path $ofile -Destination $ofile2
# Hash local file stream
$md5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$stream = [System.IO.File]::OpenRead($ofile)
$hash = [BitConverter]::ToString($md5.ComputeHash($stream)).Replace('-','')
FWIW, passing the file stream object to $md5.ComputeHash($stream) directly is likely going to be faster than manually buffering the input

Final listing
$ifile = "C:\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
$md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$bufferSize = 1mb
$stream = [System.IO.File]::OpenRead($ifile)
$makenew = [System.IO.File]::OpenWrite($ofile)
$makenew2 = [System.IO.File]::OpenWrite($ofile2)
$buffer = new-object Byte[] $bufferSize
while ( $stream.Position -lt $stream.Length )
$bytesRead = $stream.Read($buffer, 0, $bufferSize)
$makenew.Write($buffer, 0, $bytesread)
$makenew2.Write($buffer, 0, $bytesread)
$hash = $md5.TransformBlock($buffer, 0 , $bytesRead, $null , 0)
$md5.TransformFinalBlock([byte[]]::new(0), 0, 0)
$hash = [BitConverter]::ToString($md5.Hash).Replace('-','')


Final Update: Turns out I didn't need Binary writer. I could just copy memory streams from one archive to another.
I'm re-writing a PowerShell script which works with archives. I'm using two functions from here
Expand-Archive without Importing and Exporting files
and can successfully read and write files to the archive. I've posted the whole program just in case it makes things clearer for someone to help me.
However, there are three issues (besides the fact that I don't really know what I'm doing).
1.) Most files have this error on when trying to run
Add-ZipEntry -ZipFilePath ($OriginalArchivePath + $PartFileDirectoryName) -EntryPath $entry.FullName -Content $fileBytes}
Cannot convert value "507" to type "System.Byte". Error: "Value was either too large or too small for an unsigned byte." (replace 507 with whatever number from the byte array is there)
2.) When it reads a file and adds it to the zip archive (*.imscc) it adds a character "a" to the beginning of the file contents.
3.) The only file it doesn't error on are text files, when I really want it to handle any file
Thank you for any assistance!
Update: I've tried using System.IO.BinaryWriter, with the same errors.
Add-Type -AssemblyName 'System.Windows.Forms'
Add-Type -AssemblyName 'System.IO.Compression'
Add-Type -AssemblyName 'System.IO.Compression.FileSystem'
function Folder-SuffixGenerator($SplitFileCounter)
return ' ('+$usrSuffix+' '+$SplitFileCounter+')'
function Get-ZipEntryContent(#returns the bytes of the first matching entry
[string] $ZipFilePath, #optional - specify a ZipStream or path
[IO.Stream] $ZipStream = (New-Object IO.FileStream($ZipFilePath, [IO.FileMode]::Open)),
[string] $EntryPath){
$ZipArchive = New-Object IO.Compression.ZipArchive($ZipStream, [IO.Compression.ZipArchiveMode]::Read)
$buf = New-Object byte[] (0) #return an empty byte array if not found
$ZipArchive.GetEntry($EntryPath) | ?{$_} | %{ #GetEntry returns first matching entry or null if there is no match
$buf = New-Object byte[] ($_.Length)
Write-Verbose " reading: $($_.Name)"
return ,$buf
function Add-ZipEntry(#Adds an entry to the $ZipStream. Sample call: Add-ZipEntry -ZipFilePath "$PSScriptRoot\temp.zip" -EntryPath Test.xml -Content ([text.encoding]::UTF8.GetBytes("Testing"))
[string] $ZipFilePath, #optional - specify a ZipStream or path
[IO.Stream] $ZipStream = (New-Object IO.FileStream($ZipFilePath, [IO.FileMode]::OpenOrCreate)),
[string] $EntryPath,
[byte[]] $Content,
[switch] $OverWrite, #if specified, will not create a second copy of an existing entry
[switch] $PassThru ){#return a copy of $ZipStream
$ZipArchive = New-Object IO.Compression.ZipArchive($ZipStream, [IO.Compression.ZipArchiveMode]::Update, $true)
$ExistingEntry = $ZipArchive.GetEntry($EntryPath) | ?{$_}
If($OverWrite -and $ExistingEntry){
Write-Verbose " deleting existing $($ExistingEntry.FullName)"
$Entry = $ZipArchive.CreateEntry($EntryPath)
$WriteStream = New-Object System.IO.StreamWriter($Entry.Open())
$OutStream = New-Object System.IO.MemoryStream
$ZipStream.Seek(0, 'Begin') | Out-Null
$NoDeleteFiles = #('files_meta.xml' ,'course_settings.xml', 'assignment_groups.xml', 'canvas_export.txt', 'imsmanifest.xml')
Set-Variable usrSuffix -Option ReadOnly -Value 'part' -Force
$MaxImportFileSize = 1000
$compressionLevel = [System.IO.Compression.CompressionLevel]::Optimal
$SplitFileCounter = 1
$FileBrowser = New-Object System.Windows.Forms.OpenFileDialog
$FileBrowser.filter = "Canvas Export Files (*.imscc)| *.imscc"
$FilePath = $FileBrowser.FileName
$OriginalArchivePath = $FilePath.Substring(0,$FilePath.Length-6)
$PartFileDirectoryName = $OriginalArchive + (Folder-SuffixGenerator($SplitFileCounter)) + '.imscc'
$CourseZip = [IO.Compression.ZipFile]::OpenRead($FilePath)
$CourseZipFiles = $CourseZip.Entries | Sort Length -Descending
$SortingTable = $CourseZip.entries | Select Fullname,
Sort Size -Descending | format-table –AutoSize
# Add mandatory files
ForEach($entry in $CourseZipFiles)
if ($NoDeleteFiles.Contains($entry.Name)){
Write-Output "Adding to Zip" + $entry.FullName
# Add to Zip
$fileBytes = Get-ZipEntryContent -ZipFilePath $FilePath -EntryPath $entry.FullName
Add-ZipEntry -ZipFilePath ($OriginalArchivePath + $PartFileDirectoryName) -EntryPath $entry.FullName -Content $fileBytes
System.IO.StreamWriter is a text writer, and therefore not suitable for writing raw bytes. Cannot convert value "507" to type "System.Byte" indicates that an inappropriate attempt was made to convert text - a .NET string composed of [char] instances which are in effect [uint16] code points (range 0x0 - 0xffff) - to [byte] instances (0x0 - 0xff). Therefore, any Unicode character whose code point is greater than 255 (0xff) will cause this error.
The solution is to use a .NET API that allows writing raw bytes, namely System.IO.BinaryWriter:
$WriteStream = [System.IO.BinaryWriter]::new($Entry.Open())

I use powershell to parse huge files and easily take a look at a small part of the file where a certain string occurs.. like this:
Select-String P120300420059211107104259.txt -Pattern "<ID>9671510841" -Context 0,300
This gives me 300 lines of the file after the occurance of that ID number.
But I've come across a file that has no carriage returns. Now I would like to do the same thing, but instead of lines being returned, I guess I need characters.
How would I do this?
I have never created scripts in powershell - just ran simple commands like the above.
I would like to see maybe 1000 characters after the matched string, within a huge file.
The problem with using Select-String or [Regex]::Matches() (or -match) to test for the presence of a substring in a single-line file is that you first need to read the whole file into memory at once.
The good news is that you don't need regular expressions to find a substring in a huge single-line text file - instead, you can read the file contents into memory in smaller chunks and then search through those - this way you don't need to store the entire file in memory at once.
Reading buffered text from a file is fairly straightforward:
Open a readable file stream
Create a StreamReader to read from the file stream
Start reading!
Then you just need to check whether:
The target substring is found in each chunk, or
The start of the target substring is partially found at the tail end of the current chunk
And then repeat until you find the substring, at which point you read the following 1000 characters.
Here's an example of how you could implement it as script function (I've tried to explain the code in more detail in inline comments):
function Find-SubstringWithPostContext {
[CmdletBinding(DefaultParameterSetName = 'wp')]
[Parameter(Mandatory = $true, ParameterSetName = 'lp', ValueFromPipelineByPropertyName = $true, ValueFromPipeline = $true)]
[Parameter(Mandatory = $true, ParameterSetName = 'wp', Position = 0)]
[Parameter(Mandatory = $true)]
[ValidateLength(1, 5000)]
[ValidateRange(2, 25000)]
[int]$PostContext = 1000,
begin {
# start by ensuring we'll be using a buffer that's at least 4 larger than the
# target substring to avoid too many tail searches
$bufferSize = 2000
while ($Substring.Length -gt $bufferSize / 4) {
$bufferSize *= 2
$buffer = [char[]]::new($bufferSize)
process {
if ($PSCmdlet.ParameterSetName -eq 'wp') {
# resolve input paths if necessary
$LiteralPath = $Path | Convert-Path
foreach ($lp in $LiteralPath) {
$file = Get-Item -LiteralPath $lp
# skip directories
if ($file -isnot [System.IO.FileInfo]) { continue }
try {
$fileStream = $file.OpenRead()
$scanner = [System.IO.StreamReader]::new($fileStream, $true)
do {
# remember the current offset in the file, we'll need this later
$baseOffset = $fileStream.Position
# read a chunk from the file, convert to string
$readCount = $scanner.ReadBlock($buffer, 0, $bufferSize)
$string = [string]::new($buffer, 0, $readCount)
$eof = $readCount -lt $bufferSize
# test if target substring is found in the chunk we just read
$indexOfTarget = $string.IndexOf($Substring)
if ($indexOfTarget -ge 0) {
Write-Verbose "Substring found in chunk at local index ${indexOfTarget}"
# we found a match, ensure we've read enough post-context ahead of the given index
$tail = ''
if ($string.Length - $indexOfTarget -lt $PostContext -and $readCount -eq $bufferSize) {
# just like above, we read another chunk from the file and convert it to a proper string
$tailBuffer = [char[]]::new($PostContext - ($string.Length - $indexOfTarget))
$tailCount = $scanner.ReadBlock($tailBuffer, 0, $tailBuffer.Length)
$tail = [string]::new($tailBuffer, 0, $tailCount)
# construct and output the full post-context
$substringWithPostContext = $string.Substring($indexOfTarget) + $tail
if($substringWithPostContext.Length -gt $PostContext){
$substringWithPostContext = $substringWithPostContext.Remove($PostContext)
Write-Verbose "Writing output object ..."
Write-Output $([PSCustomObject]#{
FilePath = $file.FullName
Offset = $baseOffset + $indexOfTarget
Value = $substringWithPostContext
if (-not $All) {
# no need to search this file any further unless `-All` was specified
continue fileLoop
else {
# rewind to position after this match before next iteration
$rewindOffset = $indexOfTarget - $readCount
$null = $scanner.BaseStream.Seek($rewindOffset, [System.IO.SeekOrigin]::Current)
else {
# target was not found, but we may have "clipped" it in half,
# so figure out if target string could start at the end of current string chunk
for ($i = $string.Length - $target.Length; $i -lt $string.Length; $i++) {
# if the first character of the target substring isn't found then
# we might as well skip it immediately
if ($string[$i] -ne $target[0]) { continue }
if ($target.StartsWith($string.Substring($i))) {
# rewind file stream to this position so it'll get re-tested on
# the next iteration, then break out of tail search
$rewindOffset = $i - $string.Length
$null = $scanner.BaseStream.Seek($rewindOffset, [System.IO.SeekOrigin]::Current)
} until ($eof)
finally {
# remember to clean up after searching each file
$scanner, $fileStream |Where-Object { $_ -is [System.IDisposable] } |ForEach-Object Dispose
Now you can extract exactly 1000 characters after a substring is found with minimal memory allocation:
Get-ChildItem P*.txt |Find-SubstringWithPostContext -Substring '<ID>9671510841'
I haven't tested this enough to tell you if it works properly but it definitely was something fun to code. -Context here will give you the context based on characters before and after instead of lines. You can give it a try and let me know if it worked :)
Get-ChildItem *.txt | Find-String -Pattern 'mypattern'
Get-ChildItem *.txt | Find-String -Pattern 'mypattern' -Context 20, 20
Get-ChildItem *.txt | Find-String -Pattern 'mypattern' -AllMatches
using namespace System.Text.RegularExpressions
using namespace System.IO
function Find-String {
[parameter(ValueFromPipeline, Mandatory)]
[parameter(Mandatory, Position = 0)]
[RegexOptions]$Options = 'IgnoreCase',
$re = [regex]::new($Pattern, $Options)
$content = [File]::ReadAllText($Path)
$match = if($AllMatches.IsPresent)
if($match.Success -notcontains $true) { return }
foreach($m in $match)
$out = [ordered]#{
Path = $path.FullName
Value = $m.Value
Index = $m.Index
Length = $m.Length
$before = $m.Index
$after = $m.Index + $m.Length
$contextBefore = $Context[0]
$contextAfter = $Context[1]
while($contextBefore-- -and $before)
while($contextAfter-- -and $after -lt $content.Length)
$out.Context = (-join $content[$before..$after]).Trim()

I am very new to powershell script. i am trying to get SSAS Tabular model connection string details for multiple servers. i have code which will return only for single server. How to modify the code to pass multiple servers?
$servername = "servername1"
# Connect SSAS Server
$server = New-Object Microsoft.AnalysisServices.Server
$DSTable = #();
foreach ( $db in $server.databases)
$dbname = $db.Name
$Srver = $db.ParentServer
foreach ( $ds in $db.Model.DataSources)
$hash = #
"Server" = $Srver;
"Model_Name" = $dbname ;
"Datasource_Name" = $ds.Name ;
"ConnectionString" = $ds.ConnectionString ;
"ImpersonationMode" = $ds.ImpersonationMode;
"Impersonation_Account" = $ds.Account;
$row = New-Object psobject -Property $hash
$DSTable += $row
As commented, you can surround the code you have in another foreach loop.
Using array concatenation with += is a bad idea, because on each addition, the entire array needs to be recreated in memory, so that is both time and memory consuming.
Best thing is to let PowerShell do the heavy lifting of collecting the data:
$allServers = 'server01','server02','server03' # etc. an array of servernames
# loop through the servers array and collect the utput in variable $result
$result = foreach($servername in $allServers) {
# Connect SSAS Server
$server = New-Object Microsoft.AnalysisServices.Server
foreach ( $db in $server.databases) {
foreach ( $ds in $db.Model.DataSources) {
# output an object with the desired properties
Server = $db.ParentServer
Model_Name = $db.Name
Datasource_Name = $ds.Name
ConnectionString = $ds.ConnectionString
ImpersonationMode = $ds.ImpersonationMode
Impersonation_Account = $ds.Account
# output on screen
$result | Out-GridView -Title 'SSAS connection string details'
# output to a CSV file (change the path and filename here of course..)
$result | Export-Csv -Path 'D:\Test\MySSAS_Connections.csv' -UseCulture -NoTypeInformation
The above uses parameter -UseCulture because then the delimiter used for the CSV file is the same as your machine expects when double-clicking and opening in Excel. Without that, the default comma is used

In a Windows command line environment, I'd like to be able to search a binary file for the last (final) occurrence of hex 06 char ("Ack") and truncate the file from that char to the end of the file, meaning that the found char is also trimmed off. How can I do that? The files can be several hundred megabytes in size.
EDIT: To be fair, I did quite a lot of Googling for code ideas, but my search terms are not bringing me to some kind of way to tackle this. Something like "search binary file for ASCII char hex 06, find last occurrence of that char and truncate the file from that point on," is so vague as to be essentially useless. I'll keep looking!
If you start reading bytes from the end of the file you will find the last ACK (if there is one). Knowing its position, you can now truncate the file.
I'm not good at PowerShell, so there might be some cmdlet I don't know about, but this achieves what you want:
$filename = "C:\temp\FindAck.txt"
$file = Get-Item $filename
$len = $file.Length
$blockSize = 32768
$buffer = new-object byte[] $blockSize
$found = $false
$blockNum = [math]::floor($len / $blockSize)
$mode = [System.IO.FileMode]::Open
$access = [System.IO.FileAccess]::Read
$sharing = [IO.FileShare]::Read
$fs = New-Object IO.FileStream($filename, $mode, $access, $sharing)
$foundPos = -1
while (!$found -and $blockNum -ge 0) {
$fs.Position = $blockNum * $blockSize
$bytesRead = $fs.Read($buffer, 0, $blocksize)
if ($bytesRead -gt 0) {
for ($i = $bytesRead -1; $i -ge 0; $i--) {
if ($buffer[$i] -eq 6) {
$foundPos = $blockNum * $blockSize + $i
$found = $true
if ($foundPos -ne -1) {
$mode = [System.IO.FileMode]::Open
$access = [System.IO.FileAccess]::Write
$sharing = [IO.FileShare]::Read
$fs = New-Object IO.FileStream($filename, $mode, $access, $sharing)
Write-Host $foundPos
The idea of reading in 32KB blocks is to get a reasonable size chunk from the disk to process rather than reading one byte at a time.
I'm working on a script to extract data from BLOBs in a SQL database. The extraction process works great. I want to add some sort of progress indication to the script. I have a total record count from a SQL query, and an incremental counter that increases for each file exported. The incremental counter works, but the total record count - which I attempted to assign to a global variable - does not seem to hold its value. Am I declaring it incorrectly?
## Export of "larger" Sql Server Blob to file
## with GetBytes-Stream.
# Configuration data
$StartTime = Get-Date
$Server = "server";
$UserID = "user";
$Password = "password";
$Database = "db";
$Dest = "C:\Users\me\Desktop\Test\";
$bufferSize = 8192;
# Counts total rows
$CountSql = "SELECT Count(extension) as countall from
SELECT p.[people_id], right(pi.[file_name],4) as extension
FROM dbo.pictures as pi
INNER JOIN dbo.people AS p ON p.person_picture = pi.pictures_id
where left([image_type], 5) = 'image'
) as countall"
# Selects Data
$Sql = "SELECT p.[people_id], pi.[image_file], right(pi.[file_name],4), ROW_NUMBER() OVER (ORDER BY people_id) as count
FROM dbo.pictures as pi
INNER JOIN dbo.people AS p ON p.person_picture = pi.pictures_id
where left([image_type], 5) = 'image'";
# Open ADO.NET Connection
$con = New-Object Data.SqlClient.SqlConnection;
$con.ConnectionString = "Data Source=$Server;" +
"Integrated Security=False;" +
"User ID=$UserID;" +
"Password=$Password;" +
"Initial Catalog=$Database";
# New Command and Reader for total row count
$CountCmd = New-Object Data.SqlClient.SqlCommand $CountSql, $con;
$crd = $CountCmd.ExecuteReader();
While ($crd.Read())
# New Command and Reader for rest of data
$cmd = New-Object Data.SqlClient.SqlCommand $Sql, $con;
$rd = $cmd.ExecuteReader();
# Create a byte array for the stream.
$out = [array]::CreateInstance('Byte', $bufferSize)
# Looping through records
While ($rd.Read())
$total = $global:1
$counter = ($rd.GetValue(3));
Write-Output ("Exporting $counter of $total`: {0}" -f $rd.GetGUID(0));
# New BinaryWriter
$fs = New-Object System.IO.FileStream ($Dest + $rd.GetGUID(0) + $rd.GetString(2)), Create, Write;
$bw = New-Object System.IO.BinaryWriter $fs;
$start = 0;
# Read first byte stream
$received = $rd.GetBytes(1, $start, $out, 0, $bufferSize - 1);
While ($received -gt 0)
$bw.Write($out, 0, $received);
$start += $received;
# Read next byte stream
$received = $rd.GetBytes(1, $start, $out, 0, $bufferSize - 1);
# Closing & Disposing all objects
$EndTime = Get-Date
$TotalTime = $EndTime - $StartTime
Write-Host ("Finished in {0:g}" -f $TotalTime)
PS C:\Users\me> C:\Scripts\ExportImagesFromNTST.ps1
Exporting 1 of : 3089b464-e667-4bf4-80b3-0002d582d4fa
Exporting 2 of : 04cf7738-ae19-4771-92b8-0003c5f27947
Exporting 3 of : 94485b5d-fe71-438d-a097-000ad185c915
and so on. 21380 should be $1 which should also be $total.
I think PetSerAl hit the nail on the head here. You create a SqlCommand object ($CountCmd), and from that create a SqlDataReader ($crd), and then tell $crd to use the GetValue() method that accepts an integer as a parameter, so that it knows which column to return the value of, but you reference a global variable with a name of '1', which is never defined, so you effectively pass $null to that method, so it doesn't get any value. I'm honestly surprised that it doesn't throw errors at you right there. You would probably want to just pass the integer 1 as the argument for that method, and assign it to $Total in the While loop. I'm honestly guessing here, but from what I see I think it should be:
$crd = $CountCmd.ExecuteReader();
While ($crd.Read())
$Total = $crd.GetValue(0)
I'm pretty sure that will assign the value of the first column (which for that sql command should just be 1 row with 1 column, right? Just the total count?), anyway, assign the first column's value for the current row to $Total. Then later you can reference $Total just fine to update your progress.
You sir, need to look into the write-progress cmdlet if you want to track progress, it's perfect for your script.