How to chunk items from pipeline in PowerShell? - powershell

in my PowerShell cmdlet I get an arbitrary number of items via pipeline and want to return chunks of a specified number of items.
When, for example, my script gets as input:
("A", "B", "C", "D", "E", "F", "G")
And I define, let's say 4 as chunk size, I'd like to return something like this:
(
("A", "B", "C", "D"),
("E", "F", "G")
)
Any help would be appreciated.

You can write a simple function that buffers N input objects before spitting out a new array, then output any buffered values you might have left over when you reach the end of the input sequence:
function chunk {
param(
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[psobject]$InputObject,
[ValidateRange(1, 100000)]
[int]$ChunkSize = 4
)
begin {
$counter = 0
# Set up array that will act as buffer
$chunk = [object[]]::new($ChunkSize)
}
process {
# Add input object to next available slot in array
$chunk[$counter++] = $InputObject
if($counter -eq $ChunkSize){
# If we've filled the buffer, output it as a new chunk
Write-Output $chunk -NoEnumerate
# Reset counter and buffer
$counter = 0
$chunk = [object[]]::new($ChunkSize)
}
}
end {
if($counter){
# There's no more input but we have some data left over still, output it
Write-Output $chunk[0..($counter-1)] -NoEnumerate
}
}
}
Now you can do:
PS ~> $firstChunk,$nextChunk = "A", "B", "C", "D", "E", "F", "G" |chunk
PS ~> $firstChunk
A
B
C
D
PS ~> $nextChunk
E
F
G

If I could save it to a file first, it could work using get-content's -readcount parameter. I couldn't keep the 2 lists wrapped with regular arrays and +=, so I used an arraylist, hiding the output of arraylist.add(). I wish you could do named pipes on the fly like in zsh.
echo A B C D E F G | set-content file # PS7: 'A'..'G'
get-content file -ReadCount 4 |
% { [collections.arraylist]$list = #() } { $list.add($_) > $null }
$list[0]
A
B
C
D
$list[1]
E
F
G

As of powershell 7 and if you have .net 6 installed:
$arr = [string[]]#("A", "B", "C", "D", "E", "F", "G")
$chunks = [System.Linq.Enumerable]::Chunk($arr, 4)
May be used like so:
foreach ($chunk in $chunks) {
"hello " + $chunk
}
Which outputs as:
hello A B C D
hello E F G
The Chunk function takes any IEnumerable as its first argument.
EDIT: Meh had to do this with powershell 5 :(
So for powershell 5, the accepted answer is good, but if you want a type safe approach:
function chunk {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[System.Collections.IEnumerable]$Iter,
[Parameter(Mandatory = $true)][ValidateRange([uint32]::MinValue, [uint32]::MaxValue)]
[uint32]$ChunkSize
)
$type = $iter[0].GetType().Name
$numer = $Iter.GetEnumerator()
$chunks = new-object System.Collections.Generic.List[System.collections.Generic.List[$($type)]]
while ($numer.MoveNext()) {
$i = 1
$chunk = new-object System.collections.Generic.List[$($type)]
$chunk.Add($numer.Current)
while ($i % $ChunkSize -gt 0 -and $numer.MoveNext()) {
$i++
$chunk.Add($numer.Current)
}
$chunks.Add($chunk)
}
return , $chunks
}

Related

ForEach loop to get next items [duplicate]

in my PowerShell cmdlet I get an arbitrary number of items via pipeline and want to return chunks of a specified number of items.
When, for example, my script gets as input:
("A", "B", "C", "D", "E", "F", "G")
And I define, let's say 4 as chunk size, I'd like to return something like this:
(
("A", "B", "C", "D"),
("E", "F", "G")
)
Any help would be appreciated.
You can write a simple function that buffers N input objects before spitting out a new array, then output any buffered values you might have left over when you reach the end of the input sequence:
function chunk {
param(
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[psobject]$InputObject,
[ValidateRange(1, 100000)]
[int]$ChunkSize = 4
)
begin {
$counter = 0
# Set up array that will act as buffer
$chunk = [object[]]::new($ChunkSize)
}
process {
# Add input object to next available slot in array
$chunk[$counter++] = $InputObject
if($counter -eq $ChunkSize){
# If we've filled the buffer, output it as a new chunk
Write-Output $chunk -NoEnumerate
# Reset counter and buffer
$counter = 0
$chunk = [object[]]::new($ChunkSize)
}
}
end {
if($counter){
# There's no more input but we have some data left over still, output it
Write-Output $chunk[0..($counter-1)] -NoEnumerate
}
}
}
Now you can do:
PS ~> $firstChunk,$nextChunk = "A", "B", "C", "D", "E", "F", "G" |chunk
PS ~> $firstChunk
A
B
C
D
PS ~> $nextChunk
E
F
G
If I could save it to a file first, it could work using get-content's -readcount parameter. I couldn't keep the 2 lists wrapped with regular arrays and +=, so I used an arraylist, hiding the output of arraylist.add(). I wish you could do named pipes on the fly like in zsh.
echo A B C D E F G | set-content file # PS7: 'A'..'G'
get-content file -ReadCount 4 |
% { [collections.arraylist]$list = #() } { $list.add($_) > $null }
$list[0]
A
B
C
D
$list[1]
E
F
G
As of powershell 7 and if you have .net 6 installed:
$arr = [string[]]#("A", "B", "C", "D", "E", "F", "G")
$chunks = [System.Linq.Enumerable]::Chunk($arr, 4)
May be used like so:
foreach ($chunk in $chunks) {
"hello " + $chunk
}
Which outputs as:
hello A B C D
hello E F G
The Chunk function takes any IEnumerable as its first argument.
EDIT: Meh had to do this with powershell 5 :(
So for powershell 5, the accepted answer is good, but if you want a type safe approach:
function chunk {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[System.Collections.IEnumerable]$Iter,
[Parameter(Mandatory = $true)][ValidateRange([uint32]::MinValue, [uint32]::MaxValue)]
[uint32]$ChunkSize
)
$type = $iter[0].GetType().Name
$numer = $Iter.GetEnumerator()
$chunks = new-object System.Collections.Generic.List[System.collections.Generic.List[$($type)]]
while ($numer.MoveNext()) {
$i = 1
$chunk = new-object System.collections.Generic.List[$($type)]
$chunk.Add($numer.Current)
while ($i % $ChunkSize -gt 0 -and $numer.MoveNext()) {
$i++
$chunk.Add($numer.Current)
}
$chunks.Add($chunk)
}
return , $chunks
}

Stream just part of a file using PowerShell and compute hash

I need to be able to identify some large binary files which have been copied and renamed between secure servers. To do this, I would like to be able to hash the first X bytes and the last X bytes of all the files. I need to do this with only what is available on a standard Windows 10 system with no additional software installed, so PowerShell seems like the right choice.
Some things that don't work:
I cannot read the entire file in, then extract the parts of the file I want to hash. The objective I'm trying to achieve is to minimize the amount of the file I need to read, and reading the entire file defeats that purpose.
Reading moderately large portions of a file into a PowerShell variable appears to be pretty slow, so $hash.ComputeHash($moderatelyLargeVariable) doesn't seem like a viable solution.
I'm pretty sure I need to do $hash.ComputeHash($stream) where $stream only streams part of the file.
Thus far I've tried:
function Get-FileStreamHash {
param (
$FilePath,
$Algorithm
)
$hash = [Security.Cryptography.HashAlgorithm]::Create($Algorithm)
## METHOD 0: See description below
$stream = ([IO.StreamReader]"${FilePath}").BaseStream
$hashValue = $hash.ComputeHash($stream)
## END of part I need help with
# Convert to a hexadecimal string
$hexHashValue = -join ($hashValue | ForEach-Object { "{0:x2}" -f $_ })
$stream.Close()
# return
$hexHashValue
}
Method 0: This works, but it's streaming the whole file and thus doesn't solve my problem. For a 3GB file this takes about 7 seconds on my machine.
Method 1: $hashValue = $hash.ComputeHash((Get-Content -Path $FilePath -Stream "")). This also is streaming the whole file, and it also takes forever. For the same 3GB file it takes something longer than 5 minutes (I cancelled at that point, and don't know what the total duration would be).
Method 2: $hashValue = $hash.ComputeHash((Get-Content -Path $FilePath -Encoding byte -TotalCount $qtyBytes -Stream "")). This is the same as Method 1, except that it limits the content to $qtyBytes. At 1000000 (1MB) it takes 18 seconds. I think that means Method 1 would have taken ~15 hours, 7700x slower than Method 0.
Is there a way to do something like Method 2 (limit what is read) but without the slow down? And if so, is there a good way to do it on just the end of the file?
Thanks!
You could try one (or a combination of both) of the following helper functions to read a number of bytes from the beginning of the file or taken from the end:
function Read-FirstBytes {
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true, Position = 0)]
[Alias('FullName', 'FilePath')]
[ValidateScript({ Test-Path -Path $_ -PathType Leaf })]
[string]$Path,
[Parameter(Mandatory=$true, Position = 1)]
[int]$Bytes,
[ValidateSet('ByteArray', 'HexString', 'Base64')]
[string]$As = 'ByteArray'
)
try {
$stream = [System.IO.File]::OpenRead($Path)
$length = [math]::Min([math]::Abs($Bytes), $stream.Length)
$buffer = [byte[]]::new($length)
$null = $stream.Read($buffer, 0, $length)
switch ($As) {
'HexString' { ($buffer | ForEach-Object { "{0:x2}" -f $_ }) -join '' ; break }
'Base64' { [Convert]::ToBase64String($buffer) ; break }
default { ,$buffer }
}
}
catch { throw }
finally { $stream.Dispose() }
}
function Read-LastBytes {
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true, Position = 0)]
[Alias('FullName', 'FilePath')]
[ValidateScript({ Test-Path -Path $_ -PathType Leaf })]
[string]$Path,
[Parameter(Mandatory=$true, Position = 1)]
[int]$Bytes,
[ValidateSet('ByteArray', 'HexString', 'Base64')]
[string]$As = 'ByteArray'
)
try {
$stream = [System.IO.File]::OpenRead($Path)
$length = [math]::Min([math]::Abs($Bytes), $stream.Length)
$null = $stream.Seek(-$length, 'End')
$buffer = for ($i = 0; $i -lt $length; $i++) { $stream.ReadByte() }
switch ($As) {
'HexString' { ($buffer | ForEach-Object { "{0:x2}" -f $_ }) -join '' ; break }
'Base64' { [Convert]::ToBase64String($buffer) ; break }
default { ,[Byte[]]$buffer }
}
}
catch { throw }
finally { $stream.Dispose() }
}
Then you can compute a hash value from it and format as you like.
Combinations are possible like
$begin = Read-FirstBytes -Path 'D:\Test\somefile.dat' -Bytes 50 # take the first 50 bytes
$end = Read-LastBytes -Path 'D:\Test\somefile.dat' -Bytes 1000 # and the last 1000 bytes
$Algorithm = 'MD5'
$hash = [Security.Cryptography.HashAlgorithm]::Create($Algorithm)
$hashValue = $hash.ComputeHash($begin + $end)
($hashValue | ForEach-Object { "{0:x2}" -f $_ }) -join ''
I believe this would be a more efficient way of reading the last bytes of your file using System.IO.BinaryReader. You can combine this function with the function you have, it can read all bytes, last n bytes (-Last) or first n bytes (-First).
function Read-Bytes {
[cmdletbinding(DefaultParameterSetName = 'Path')]
param(
[parameter(
Mandatory,
ValueFromPipelineByPropertyName,
ParameterSetName = 'Path',
Position = 0
)][alias('FullName')]
[ValidateScript({
if(Test-Path $_ -PathType Leaf)
{
return $true
}
throw 'Invalid File Path'
})]
[System.IO.FileInfo]$Path,
[parameter(
HelpMessage = 'Specifies the number of Bytes from the beginning of a file.',
ParameterSetName = 'FirstBytes',
Position = 1
)]
[int64]$First,
[parameter(
HelpMessage = 'Specifies the number of Bytes from the end of a file.',
ParameterSetName = 'LastBytes',
Position = 1
)]
[int64]$Last
)
process
{
try
{
$reader = [System.IO.BinaryReader]::new(
[System.IO.File]::Open(
$Path.FullName,
[system.IO.FileMode]::Open,
[System.IO.FileAccess]::Read
)
)
$stream = $reader.BaseStream
$length = (
$stream.Length, $First
)[[int]($First -lt $stream.Length -and $First)]
$stream.Position = (
0, ($length - $Last)
)[[int]($length -gt $Last -and $Last)]
$bytes = while($stream.Position -ne $length)
{
$stream.ReadByte()
}
[pscustomobject]#{
FilePath = $Path.FullName
Length = $length
Bytes = $bytes
}
}
catch
{
Write-Warning $_.Exception.Message
}
finally
{
$reader.Close()
$reader.Dispose()
}
}
}
Usage
Get-ChildItem . -File | Read-Bytes -Last 100: Reads the last 100 bytes of all files on the current folder. If the -Last argument exceeds the file length, it reads the entire file.
Get-ChildItem . -File | Read-Bytes -First 100: Reads the first 100 bytes of all files on the current folder. If the -First argument exceeds the file length, it reads the entire file.
Read-Bytes -Path path/to/file.ext: Reads all bytes of file.ext.
Output
Returns an object with the properties FilePath, Length, Bytes.
FilePath Length Bytes
-------- ------ -----
/home/user/Documents/test/...... 14 {73, 32, 119, 111…}
/home/user/Documents/test/...... 0
/home/user/Documents/test/...... 0
/home/user/Documents/test/...... 0
/home/user/Documents/test/...... 116 {111, 109, 101, 95…}
/home/user/Documents/test/...... 17963 {50, 101, 101, 53…}
/home/user/Documents/test/...... 3617 {105, 32, 110, 111…}
/home/user/Documents/test/...... 638 {101, 109, 112, 116…}
/home/user/Documents/test/...... 0
/home/user/Documents/test/...... 36 {65, 99, 114, 101…}
/home/user/Documents/test/...... 735 {117, 112, 46, 79…}
/home/user/Documents/test/...... 1857 {108, 111, 115, 101…}
/home/user/Documents/test/...... 77 {79, 80, 69, 78…}

Creating Arraylist of Arraylist by slicing existing arraylist

I have the following variable defined
$A = New-Object -TypeName "System.Collections.ArrayList"
Now I add n elements to it :
$A.Add(1..n)
Now I want to divide $A into p parts of k elements each(The last one might have lesser elements if p*k>$A.count).
How do I do that?
You can use a function to split an array into several smaller arrays.
Below a slighty adapted version of that function found here:
function Split-Array {
[CmdletBinding(DefaultParametersetName = 'ByChunkSize')]
Param(
[Parameter(Mandatory = $true, Position = 0)]
$Array,
[Parameter(Mandatory = $true, Position = 1, ParameterSetName = 'ByChunkSize')]
[ValidateRange(1,[int]::MaxValue)]
[int]$ChunkSize,
[Parameter(Mandatory = $true, Position = 1, ParameterSetName = 'ByParts')]
[ValidateRange(1,[int]::MaxValue)]
[int]$Parts
)
$items = $Array.Count
switch ($PsCmdlet.ParameterSetName) {
'ByChunkSize' { $Parts = [Math]::Ceiling($items / $ChunkSize) }
'ByParts' { $ChunkSize = [Math]::Ceiling($items / $Parts) }
default { throw "Split-Array: You must use either the Parts or the ChunkSize parameter" }
}
# when the given ChunkSize is larger or equal to the number of items in the array
# use TWO unary commas to return the array as single sub array of the result.
if ($ChunkSize -ge $items) { return ,,$Array }
$result = for ($i = 1; $i -le $Parts; $i++) {
$first = (($i - 1) * $ChunkSize)
$last = [Math]::Min(($i * $ChunkSize) - 1, $items - 1)
,$Array[$first..$last]
}
return ,$result
}
In your case you could use it like:
$p = 4 # the number of parts you want
$subArrays = Split-Array $A.ToArray() -Parts $p
or
$k = 4 # the max number items in each part
$subArrays = Split-Array $A.ToArray() -ChunkSize $k
Here is a function I came up with to chunk your System.Collections.ArrayList into a nested array list of p parts. It uses a System.Collections.Specialized.OrderedDictionary to group the size k chunks by index / chunksize, which is then rounded down to the nearest integer using System.Math.Floor. It then only fetches the groups with keys from 0 to $Parts.
function Split-ArrayList {
[CmdletBinding()]
param (
# Arraylist to slice
[Parameter(Mandatory=$true)]
[System.Collections.ArrayList]
$ArrayList,
# Chunk size per part
[Parameter(Mandatory=$true)]
[ValidateRange(1, [int]::MaxValue)]
[int]
$ChunkSize,
# Number of parts
[Parameter(Mandatory=$true)]
[ValidateRange(1, [int]::MaxValue)]
[int]
$Parts
)
# Group chunks into hashtable
$chunkGroups = [ordered]#{}
for ($i = 0; $i -lt $ArrayList.Count; $i++) {
# Get the hashtable key by dividing the index by the chunk size
# Round down to nearest integer using Math.Floor
[int]$key = [Math]::Floor($i / $ChunkSize)
# Add new arraylist for key if it doesn't exist
# ContainsKey is not supported for ordered dictionary
if ($chunkGroups.Keys -notcontains $key) {
$chunkGroups.Add($key, [System.Collections.ArrayList]::new())
}
# Add number to hashtable
[void]$chunkGroups[$key].Add($ArrayList[$i])
}
# Create nested ArrayList of parts
$result = [System.Collections.ArrayList]::new()
for ($key = 0; $key -lt $Parts; $key++) {
[void]$result.Add($chunkGroups[$key])
}
$result
}
Usage:
$A = [System.Collections.ArrayList]::new(1..10)
Split-ArrayList -ArrayList $A -ChunkSize 4 -Parts 1 |
ForEach-Object { "{ " + ($_ -join ", ") + " }" }
# { 1, 2, 3, 4 }
Split-ArrayList -ArrayList $A -ChunkSize 4 -Parts 2 |
ForEach-Object { "{ " + ($_ -join ", ") + " }" }
# { 1, 2, 3, 4 }
# { 5, 6, 7, 8 }
Split-ArrayList -ArrayList $A -ChunkSize 4 -Parts 3 |
ForEach-Object { "{ " + ($_ -join ", ") + " }" }
# { 1, 2, 3, 4 }
# { 5, 6, 7, 8 }
# { 9, 10 }
Note: I didn't really account for the cases where you might want to exclude Parts, so I made every parameter mandatory. You can amend the function to be more flexible with different inputs.

PowerShell function to chunk the pipe line objects into arrays can't get the correct result

I'm learning PowerShell and trying to write a function to chunk the pipe line objects into arrays. If the user provid a scriptblock $Process, the function will apply the scriptblock to each of the pipe line objects before send them out to the pipe line (not implemented yet in the below code). So let's say giving the parameter $InputObject as 1, 2, 3, 4, 5 and $ElementsPerChunk as 2, then the function should return 3 arrays #(1, 2), #(3, 4), #(5). Below is my current code:
function Chunk-Object
{
[CmdletBinding()]
Param (
[Parameter(Mandatory = $true,
ValueFromPipeline = $true,
ValueFromPipelineByPropertyName = $true)] [object[]] $InputObject,
[Parameter()] [scriptblock] $Process,
[Parameter()] [int] $ElementsPerChunk
)
Begin {
$cache = #();
$index = 0;
}
Process {
if($cache.Length -eq $ElementsPerChunk) {
# if we collected $ElementsPerChunk elements in an array, sent it out to the pipe line
Write-Output $cache;
# Then we add the current pipe line object to the array and set the $index as 1
$cache = #($_);
$index = 1;
}
else {
$cache += $_;
$index++;
}
}
End {
# Here we check if there are anything still in $cache, if so, just sent out it to pipe line
if($cache) {
Write-Output $cache;
}
}
}
echo 1 2 3 4 5 6 7 | Chunk-Object -ElementsPerChunk 2;
Write-Host "=============================================================================";
(echo 1 2 3 4 5 6 7 | Chunk-Object -ElementsPerChunk 2).gettype();
Write-Host "=============================================================================";
(echo 1 2 3 4 5 6 7 | Chunk-Object -ElementsPerChunk 2).length;
When I execute the code, I got:
1
2
3
4
5
6
7
=============================================================================
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
=============================================================================
7
As you can see, the result is an array and contains 7 elements. What I expected from the result is 4 elements: #(1, 2), #(3, 4), #(5, 6), #(7). Can anyone help me to check my code and explain the issue? Thanks.
The problem is that the pipeline is "unrolling" each of your chunk arrays as you return them from the process block. Try making this change in the Process block where you return the array:
Write-Output (,$cache);
That will make the return a 2D array, and then it will "unroll" into a single array instead of individual array elements.

Merging hashtables in PowerShell: how?

I am trying to merge two hashtables, overwriting key-value pairs in the first if the same key exists in the second.
To do this I wrote this function which first removes all key-value pairs in the first hastable if the same key exists in the second hashtable.
When I type this into PowerShell line by line it works. But when I run the entire function, PowerShell asks me to provide (what it considers) missing parameters to foreach-object.
function mergehashtables($htold, $htnew)
{
$htold.getenumerator() | foreach-object
{
$key = $_.key
if ($htnew.containskey($key))
{
$htold.remove($key)
}
}
$htnew = $htold + $htnew
return $htnew
}
Output:
PS C:\> mergehashtables $ht $ht2
cmdlet ForEach-Object at command pipeline position 1
Supply values for the following parameters:
Process[0]:
$ht and $ht2 are hashtables containing two key-value pairs each, one of them with the key "name" in both hashtables.
What am I doing wrong?
Merge-Hashtables
Instead of removing keys you might consider to simply overwrite them:
$h1 = #{a = 9; b = 8; c = 7}
$h2 = #{b = 6; c = 5; d = 4}
$h3 = #{c = 3; d = 2; e = 1}
Function Merge-Hashtables {
$Output = #{}
ForEach ($Hashtable in ($Input + $Args)) {
If ($Hashtable -is [Hashtable]) {
ForEach ($Key in $Hashtable.Keys) {$Output.$Key = $Hashtable.$Key}
}
}
$Output
}
For this cmdlet you can use several syntaxes and you are not limited to two input tables:
Using the pipeline: $h1, $h2, $h3 | Merge-Hashtables
Using arguments: Merge-Hashtables $h1 $h2 $h3
Or a combination: $h1 | Merge-Hashtables $h2 $h3
All above examples return the same hash table:
Name Value
---- -----
e 1
d 2
b 6
c 3
a 9
If there are any duplicate keys in the supplied hash tables, the value of the last hash table is taken.
(Added 2017-07-09)
Merge-Hashtables version 2
In general, I prefer more global functions which can be customized with parameters to specific needs as in the original question: "overwriting key-value pairs in the first if the same key exists in the second". Why letting the last one overrule and not the first? Why removing anything at all? Maybe someone else want to merge or join the values or get the largest value or just the average...
The version below does no longer support supplying hash tables as arguments (you can only pipe hash tables to the function) but has a parameter that lets you decide how to treat the value array in duplicate entries by operating the value array assigned to the hash key presented in the current object ($_).
Function
Function Merge-Hashtables([ScriptBlock]$Operator) {
$Output = #{}
ForEach ($Hashtable in $Input) {
If ($Hashtable -is [Hashtable]) {
ForEach ($Key in $Hashtable.Keys) {$Output.$Key = If ($Output.ContainsKey($Key)) {#($Output.$Key) + $Hashtable.$Key} Else {$Hashtable.$Key}}
}
}
If ($Operator) {ForEach ($Key in #($Output.Keys)) {$_ = #($Output.$Key); $Output.$Key = Invoke-Command $Operator}}
$Output
}
Syntax
HashTable[] <Hashtables> | Merge-Hashtables [-Operator <ScriptBlock>]
Default
By default, all values from duplicated hash table entries will added to an array:
PS C:\> $h1, $h2, $h3 | Merge-Hashtables
Name Value
---- -----
e 1
d {4, 2}
b {8, 6}
c {7, 5, 3}
a 9
Examples
To get the same result as version 1 (using the last values) use the command: $h1, $h2, $h3 | Merge-Hashtables {$_[-1]}. If you would like to use the first values instead, the command is: $h1, $h2, $h3 | Merge-Hashtables {$_[0]} or the largest values: $h1, $h2, $h3 | Merge-Hashtables {($_ | Measure-Object -Maximum).Maximum}.
More examples:
PS C:\> $h1, $h2, $h3 | Merge-Hashtables {($_ | Measure-Object -Average).Average} # Take the average values"
Name Value
---- -----
e 1
d 3
b 7
c 5
a 9
PS C:\> $h1, $h2, $h3 | Merge-Hashtables {$_ -Join ""} # Join the values together
Name Value
---- -----
e 1
d 42
b 86
c 753
a 9
PS C:\> $h1, $h2, $h3 | Merge-Hashtables {$_ | Sort-Object} # Sort the values list
Name Value
---- -----
e 1
d {2, 4}
b {6, 8}
c {3, 5, 7}
a 9
I see two problems:
The open brace should be on the same line as Foreach-object
You shouldn't modify a collection while enumerating through a collection
The example below illustrates how to fix both issues:
function mergehashtables($htold, $htnew)
{
$keys = $htold.getenumerator() | foreach-object {$_.key}
$keys | foreach-object {
$key = $_
if ($htnew.containskey($key))
{
$htold.remove($key)
}
}
$htnew = $htold + $htnew
return $htnew
}
Not a new answer, this is functionally the same as #Josh-Petitt with improvements.
In this answer:
Merge-HashTable uses the correct PowerShell syntax if you want to drop this into a module
Wasn't idempotent. I added cloning of the HashTable input, otherwise your input was clobbered, not an intention
added a proper example of usage
function Merge-HashTable {
param(
[hashtable] $default, # Your original set
[hashtable] $uppend # The set you want to update/append to the original set
)
# Clone for idempotence
$default1 = $default.Clone();
# We need to remove any key-value pairs in $default1 that we will
# be replacing with key-value pairs from $uppend
foreach ($key in $uppend.Keys) {
if ($default1.ContainsKey($key)) {
$default1.Remove($key);
}
}
# Union both sets
return $default1 + $uppend;
}
# Real-life example of dealing with IIS AppPool parameters
$defaults = #{
enable32BitAppOnWin64 = $false;
runtime = "v4.0";
pipeline = 1;
idleTimeout = "1.00:00:00";
} ;
$options1 = #{ pipeline = 0; };
$options2 = #{ enable32BitAppOnWin64 = $true; pipeline = 0; };
$results1 = Merge-HashTable -default $defaults -uppend $options1;
# Name Value
# ---- -----
# enable32BitAppOnWin64 False
# runtime v4.0
# idleTimeout 1.00:00:00
# pipeline 0
$results2 = Merge-HashTable -default $defaults -uppend $options2;
# Name Value
# ---- -----
# idleTimeout 1.00:00:00
# runtime v4.0
# enable32BitAppOnWin64 True
# pipeline 0
In case you want to merge the whole hashtable tree
function Join-HashTableTree {
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[hashtable]
$SourceHashtable,
[Parameter(Mandatory = $true, Position = 0)]
[hashtable]
$JoinedHashtable
)
$output = $SourceHashtable.Clone()
foreach ($key in $JoinedHashtable.Keys) {
$oldValue = $output[$key]
$newValue = $JoinedHashtable[$key]
$output[$key] =
if ($oldValue -is [hashtable] -and $newValue -is [hashtable]) { $oldValue | ~+ $newValue }
elseif ($oldValue -is [array] -and $newValue -is [array]) { $oldValue + $newValue }
else { $newValue }
}
$output;
}
Then, it can be used like this:
Set-Alias -Name '~+' -Value Join-HashTableTree -Option AllScope
#{
a = 1;
b = #{
ba = 2;
bb = 3
};
c = #{
val = 'value1';
arr = #(
'Foo'
)
}
} |
~+ #{
b = #{
bb = 33;
bc = 'hello'
};
c = #{
arr = #(
'Bar'
)
};
d = #(
42
)
} |
ConvertTo-Json
It will produce the following output:
{
"a": 1,
"d": 42,
"c": {
"val": "value1",
"arr": [
"Foo",
"Bar"
]
},
"b": {
"bb": 33,
"ba": 2,
"bc": "hello"
}
}
I just needed to do this and found this works:
$HT += $HT2
The contents of $HT2 get added to the contents of $HT.
The open brace has to be on the same line as ForEach-Object or you have to use the line continuation character (backtick).
This is the case because the code within { ... } is really the value for the -Process parameter of ForEach-Object cmdlet.
-Process <ScriptBlock[]>
Specifies the script block that is applied to each incoming object.
This will get you past the current issue at hand.
I think the most compact code to merge (without overwriting existing keys) would be this:
function Merge-Hashtables($htold, $htnew)
{
$htnew.keys | where {$_ -notin $htold.keys} | foreach {$htold[$_] = $htnew[$_]}
}
I borrowed it from Union and Intersection of Hashtables in PowerShell
I wanted to point out that one should not reference base properties of the hashtable indiscriminately in generic functions, as they may have been overridden (or overloaded) by items of the hashtable.
For instance, the hashtable $hash=#{'keys'='lots of them'} will have the base hashtable property, Keys overridden by the item keys, and thus doing a foreach ($key in $hash.Keys) will instead enumerate the hashed item keys's value, instead of the base property Keys.
Instead the method GetEnumerator or the keys property of the PSBase property, which cannot be overridden, should be used in functions that may have no idea if the base properties have been overridden.
Thus, Jon Z's answer is the best.
To 'inherit' key-values from parent hashtable ($htOld) to child hashtables($htNew), without modifying values of already existing keys in the child hashtables,
function MergeHashtable($htOld, $htNew)
{
$htOld.Keys | %{
if (!$htNew.ContainsKey($_)) {
$htNew[$_] = $htOld[$_];
}
}
return $htNew;
}
Please note that this will modify the $htNew object.
Here is a function version that doesn't use the pipeline (not that the pipeline is bad, just another way to do it). It also returns a merged hashtable and leaves the original unchanged.
function MergeHashtable($a, $b)
{
foreach ($k in $b.keys)
{
if ($a.containskey($k))
{
$a.remove($k)
}
}
return $a + $b
}
I just wanted to expand or simplify on jon Z's answer. There just seems to be too many lines and missed opportunities to use Where-Object. Here is my simplified version:
Function merge_hashtables($htold, $htnew) {
$htold.Keys | ? { $htnew.ContainsKey($_) } | % {
$htold.Remove($_)
}
$htold += $htnew
return $htold
}