I am having a little bit of trouble with hashtables/dictionaries in powershell. The most recent roadblock is the ability to find the index of a key in an ordered dictionary.
I am looking for a solution that isn't simply iterating through the object.
(I already know how to do that)
Consider the following example:
$dictionary = [Ordered]#{
'a' = 'blue';
'b'='green';
'c'='red'
}
If this were a normal array I'd be able to look up the index of an entry by using IndexOf().
[array]::IndexOf($dictionary,'c').
That would return 2 under normal circumstances.
If I try that with an ordered dictionary, though, I get -1.
Any solutions?
Edit:
In case anyone reading over this is wondering what I'm talking about. What I was trying to use this for was to create an object to normalize property entries in a way that also has a numerical order.
I was trying to use this for the status of a process, for example:
$_processState = [Ordered]#{
'error' = 'error'
'none' = 'none'
'started' = 'started'
'paused' = 'paused'
'cleanup' = 'cleanup'
'complete' = 'complete'
}
If you were able to easily do this, the above object would give $_processState.error an index value of 0 and ascend through each entry, finally giving $_processState.complete an index value of 5. Then if you compared two properties, by "index value", you could see which one is further along by simple operators. For instance:
$thisObject.Status = $_processState.complete
If ($thisObject.Status -ge $_processState.cleanup) {Write-Host 'All done!'}
PS > All done!
^^that doesn't work as is, but that's the idea. It's what I was aiming for. Or maybe to find something like $_processState.complete.IndexNumber()
Having an object like this also lets you assign values by the index name, itself, while standardizing the options...
$thisObject.Status = $_processState.paused
$thisObject.Status
PS > paused
Not really sure this was the best approach at the time or if it still is the best approach with all the custom class options there are available in PS v5.
It can be simpler
It may not be any more efficient than the answer from Frode F., but perhaps more concise (inline) would be simply putting the hash table's keys collection in a sub expression ($()) then calling indexOf on the result.
For your hash table...
Your particular expression would be simply:
$($dictionary.keys).indexOf('c')
...which gives the value 2 as you expected. This also works just as well on a regular hashtable... unless the hashtable is modified in pretty much any way, of course... so it's probably not very useful in that case.
In other words
Using this hash table (which also shows many of the ways to encode 4...):
$hashtable = [ordered]#{
sample = 'hash table'
0 = 'hello'
1 = 'goodbye'
[char]'4' = 'the ansi character 4 (code 52)'
[char]4 = 'the ansi character code 4'
[int]4 = 'the integer 4'
'4' = 'a string containing only the character 4'
5 = "nothing of importance"
}
would yield the following expression/results pairs:
# Expression Result
#------------------------------------- -------------
$($hashtable.keys).indexof('5') -1
$($hashtable.keys).indexof(5) 7
$($hashtable.keys).indexof('4') 6
$($hashtable.keys).indexof([char]4) 4
$($hashtable.keys).indexof([int]4) 5
$($hashtable.keys).indexof([char]'4') 3
$($hashtable.keys).indexof([int][char]'4') -1
$($hashtable.keys).indexof('sample') 0
by the way:
[int][char]'4' equals [int]52
[char]'4' has a "value" (magnitude?) of 52, but is a character, so it's used as such
...gotta love the typing system, which, while flexible, can get really really bad at times, if you're not careful.
Dictionaries uses keys and not indexes. OrderedDictionary combines a hashtable and ArrayList to give you order/index-support in a dictionary, however it's still a dictionary (key-based) collection.
If you need to get the index of an object in a OrderedDictionary (or a hasthable) you need to use foreach-loop and a counter. Example (should be created as a function):
$hashTable = [Ordered]#{
'a' = 'blue';
'b'='green';
'c'='red'
}
$i = 0
foreach($key in $hashTable.Keys) {
if($key -eq "c") { $i; break }
else { $i++ }
}
That's how it works internaly too. You can verify this by reading the source code for OrderedDictionary's IndexOfKey method in .NET Reference Source
For the initial problem I was attempting to solve, a comparable process state, you can now use Enumerations starting with PowerShell v5.
You use the Enum keyword, set the Enumerators by name, and give them an integer value. The value can be anything, but I'm using ascending values starting with 0 in this example:
Enum _ProcessState{
Error = 0
None = 1
Started = 2
Paused = 3
Cleanup = 4
Complete = 5
Verified = 6
}
#the leading _ for the Enum is just cosmetic & not required
Once you've created the Enum, you can assign it to variables. The contents of the variable will return the text name of the Enum, and you can compare them as if they were integers.
$Item1_State = [_ProcessState]::Started
$Item2_State = [_ProcessState]::Cleanup
#return state of second variable
$Item2_state
#comparison
$Item1_State -gt $Item2_State
Will return:
Cleanup
False
If you wanted to compare and return the highest:
#sort the two objects, then return the first result (should return the item with the largest enum int)
$results = ($Item1_State,$Item2_State | Sort-Object -Descending)
$results[0]
Fun fact, you can also use arithmetic on them, for example:
$Item1_State + 1
$Item1_State + $Item2_State
Will return:
Paused
Verified
More info on Enum here:
https://blogs.technet.microsoft.com/heyscriptingguy/2015/08/26/new-powershell-5-feature-enumerations/
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_enum?view=powershell-6
https://psdevopsug.scot/post/working-with-enums-in-powershell/
Related
I'm parsing a database table exported into csv where there are embedded fields in what is essentially a memo field.
The database also contains version history, and the csv contains all versions.
Basic structure of the data is Index(sequential record number),Reference(specific foreign key), Sequence (order of records for a given reference), and Data (the memo field with the data to parse).
You could think of the "Data" field as text documents limited to 80 chars wide and 40 chars deep, and then sequenced in the order they would print. Every record entry is assigned an ascending index.
For reference, $myParser is a [Microsoft.VisualBasic.FileIO.TextFieldParser], so ReadFields() returns a row of fields as an array/list.
My ultimate question is, how can this be formatted to be more intuitive to the reader? Below code is powershell, i'd be interested in answers relating to C# also,as it's something of a language agnostic style problem, though i think get/set would trivialize this to some degree.
Consider the following code (an insert/update routine in a 2 deep nested dictionary/hash):
enum cmtField
{
Index = 0
Sequence = 1
Reference = 2
Data = 4
}
$myRecords = [System.Collections.Generic.Dictionary[int,System.Collections.Generic.Dictionary[int,string]]]::new() #this could be a hash table, but is more verbose this way
While($true) #there's actually control here, but this provides a simple loop assuming infinite data
{
$myFields = $myParser.ReadFields() #read a line from the csvfile and return an array/list of fields for that line
if(!$myRecords.ContainsKey($myFields[[cmtField]::Reference])) #if the reference of the current record is new
{
$myRecords.Add($myFields[[cmtField]::Reference],[System.Collections.Generic.Dictionary[int,CommentRecord]]::new()) #create tier 1 reference index
$myRecords[$myFields[[cmtField]::Reference]].add($myFields[[cmtField]::Sequence],$myFields[[cmtField]::Data]) #create tier 2 sequence reference and data
}
else #if the reference aklready exists in the dictionary
{
if(!$myRecords[$myFields[[cmtField]::Reference]].ContainsKey($myFields[[cmtField]::Sequence])) #if the sequence ID of the current record is new
{
$myRecords[$myFields[[cmtField]::Reference]].Add($myFields[[cmtField]::Sequence],$myFields[[cmtField]::Data]) #add record at [reference][sequence]
}
else #if the sequence already exists for this reference
{
if($myRecords[$myFields[[cmtField]::Reference]][$myFields[[cmtField]::Sequence]].Index -lt $myFields[[cmtField]::Index]) #if the index of the currently read field is higher than the store index, it must be newer
{
$myRecords[$myFields[[cmtField]::Reference]][$myFields[[cmtField]::Sequence]] = $myFields[[cmtField]::Data] #replace with new data
}
#else discard currently read data (do nothing
}
}
}
Frankly, trying to make this readable both makes my head hurt and my eyes bleed a little. It only gets messier and messier the deeper the dictionary goes. I'm stuck between the bracket soup and no self-documentation.
My ultimate question is, how can this be formatted to be more intuitive to the reader?
That... ultimately depends on who "the reader" is - is it your boss? Your colleagues? Me? Will you use this code sample to teach programming to someone?
In terms of making it less "messy", there are a couple of immediate steps you can take.
The first thing I would change to make your code more readable, would be to add a using namespace directive at the top of the file:
using namespace System.Collections.Generic
Now you can create nested dictionaries with:
[Dictionary[int,Dictionary[int,string]]]::new()
... as opposed to:
[System.Collections.Generic.Dictionary[int,System.Collections.Generic.Dictionary[int,string]]]::new()
The next thing I would reduce is repeated index access patterns like $myFields[[cmtField]::Reference] - you never modify $myFields after initial assignment at the top of the loop, so there's no need to delay resolution of it.
while($true)
{
$myFields = $myParser.ReadFields()
$Reference = $myFields[[cmtField]::Reference]
$Data = $myFields[[cmtField]::Data]
$Sequence = $myFields[[cmtField]::Sequence]
$Index = $myFields[[cmtField]::Index]
if(!$myRecords.ContainsKey($Reference)) #if the reference of the current record is new
{
$myRecords.Add($Reference,[Dictionary[int,CommentRecord]]::new()) #create tier 1 reference index
$myRecords[$Reference].Add($Sequence,$Data) #create tier 2 sequence reference and data
}
else
{
# ...
Finally, you can simplify the code vastly by abandoning nested if/else statements, and instead just break it down into a succession of steps that has to pass one by one, and you end up with something like this:
using namespace System.Collections.Generic
enum cmtField
{
Index = 0
Sequence = 1
Reference = 2
Data = 4
}
$myRecords = [Dictionary[int,Dictionary[int,CommentRecord]]]::new()
while($true)
{
$myFields = $myParser.ReadFields()
$Reference = $myFields[[cmtField]::Reference]
$Data = $myFields[[cmtField]::Data]
$Sequence = $myFields[[cmtField]::Sequence]
$Index = $myFields[[cmtField]::Index]
# Step 1 - ensure tier 1 dictionary is present
if(!$myRecords.ContainsKey($Reference))
{
$myRecords.Add($Reference,[Dictionary[int,CommentRecord]]::new())
}
# (now we only need to resolve `$myRecords[$Reference]` once)
$record = $myRecords[$Reference]
# step 2 - ensure sequence entry exists
if(!$record.ContainsKey($Sequence))
{
$record.Add($Sequence, $Data)
}
# step 3 - handle superceding comment records
if($record[$Sequence].Index -lt $Index)
{
$record[$Sequence] = $Data
}
}
I personally find this easier on the eyes (and mind) than the original if/else approach
Thought I have read enough examples here and elsewhere. Still I fail creating arrays in Power Shell.
With that code I hoped to create slices of pair values from an array.
$values = #('hello','world','bonjour','moon','ola','mars')
function slice_array {
param (
[String[]]$Items
)
[int16] $size = 2
$pair = [string[]]::new($size) # size is 2
$returns = [System.Collections.ArrayList]#()
[int16] $_i = 0
foreach($item in $Items){
$pair[$_i] = $Item
$_i++;
if($_i -gt $size - 1){
$_i = 0
[void]$returns.Add($pair)
}
}
return $returns
}
slice_array($values)
the output is
ola
mars
ola
mars
ola
mars
I would hope for
'hello','world'
'bonjour','moon'
'ola','mars'
Is possible to slice that array to an array of arrays with length 2 ?
Any explenation why it doesn't work as expected ?
How should the code be changed ?
Thanks for any hint to properly understand Arrays in PowerShell !
Here's a PowerShell-idiomatic solution (the fix required for your code is in the bottom section):
The function is named Get-Slices to adhere to PowerShell's verb-noun naming convention (see the docs for more information).
Note: Often, the singular form of the noun is used, e.g. Get-Item rather than Get-Items, given that you situationally may get one or multiple output values; however, since the express purpose here is to slice a single object into multiple parts, I've chosen the plural.
The slice size (count of elements per slice) is passed as a parameter.
The function uses .., the range operator, to extract a single slice from an array.
It uses PowerShell's implicit output behavior (no need for return, no need to build up a list of return values explicitly; see this answer for more information).
It shows how to output an array as a whole from a function, which requires wrapping it in an auxiliary single-element array using the unary form of ,, the array constructor operator. Without this auxiliary array, the array's elements would be output individually to the pipeline (which is also used for function / script output; see this answer for more information.
# Note: For brevity, argument validation, pipeline support, error handling, ...
# have been omitted.
function Get-Slices {
param (
[String[]] $Items
,
[int] $Size # The slice size (element count)
)
$sliceCount = [Math]::Ceiling($Items.Count / $Size)
if ($sliceCount -le 1) {
# array is empty or as large as or smaller than a slice? ->
# wrap it *twice* to ensure that the output is *always* an
# *array of arrays*, in this case containing just *one* element
# containing the original array.
,, $Items
}
else {
foreach ($offset in 0..($sliceCount-1)) {
, $Items[($offset * $Size)..(($offset+1) * $Size - 1)] # output this slice
}
}
}
To slice an array into pairs and collect the output in an array of arrays (jagged array):
$arrayOfPairs =
Get-Slices -Items 'hello','world','bonjour','moon','ola','mars' -Size 2
Note:
Shell-like syntax is required when you call functions (commands in general) in PowerShell: arguments are whitespace-separated and not enclosed in (...) (see this answer for more information)
Since a function's declared parameters are positional by default, naming the arguments as I've done above (-Item ..., -Size ...) isn't strictly necessary, but helps readability.
Two sample calls:
"`n-- Get pairs (slice count 2):"
Get-Slices -Items 'hello','world','bonjour','moon','ola','mars' -Size 2 |
ForEach-Object { $_ -join ', ' }
"`n-- Get slices of 3:"
Get-Slices -Items 'hello','world','bonjour','moon','ola','mars' -Size 3 |
ForEach-Object { $_ -join ', ' }
The above yields:
-- Get pairs (slice count 2):
hello, world
bonjour, moon
ola, mars
-- Get slices of 3:
hello, world, bonjour
moon, ola, mars
As for what you tried:
The only problem with your code was that you kept reusing the very same auxiliary array for collecting a pair of elements, so that subsequent iterations replaced the elements of the previous ones, so that, in the end, your array list contained multiple references to the same pair array, reflecting the last pair only.
This behavior occurs, because arrays are instance of reference types rather than value types - see this answer for background information.
The simplest solution is to add a (shallow) clone of your $pair array to your list, which ensures that each list entry is a distinct array:
[void]$returns.Add($pair.Clone())
Why you got 3 equal pairs instead of different pairs:
.Net (powershell based on it) is object-oriented language and it has consept of reference types and value types. Almost all types are reference types.
What happens in your code:
You create $pair = [string[]] object. $pair variable actually stores memory address of (reference to) [string[]] object, because arrays are reference types
You fill $pair array with values
You add (!) $pair to $returns. Remember that $pair is reference to memory block. And when you add it to $returns, it adds memory address of [string[]] you wrote values to.
You repeat step2: You fill $pair array with different values, but address of this array in memory keeps the same. Doing this you actually replace values from step2 with new values in the same $pair object.
= // = step3
= // = step4
= // = step3
As a result: in $returns there are three same memory addresses: [[reference to $pair], [reference to $pair], [reference to $pair]]. And $pair values were overwritten by code with last pair values.
On output it works like this:
Powershell looks at $results which is array.
Powershell looks to $results[0] which reference to $pair
Powershell outputs reference to $pair[0]
Powershell outputs reference to $pair[1]
Powershell looks to $results[1] which reference to $pair
Powershell outputs reference to $pair[0]
Powershell outputs reference to $pair[1]
Powershell looks to $results[1] which reference to $pair
Powershell outputs reference to $pair[0]
Powershell outputs reference to $pair[1]
So you see, you triple output the object from the same memory address. You overwritten it 3 times in slice_array and now it stores only last pair values.
To fix it in your code, you should create a new $pair in memory: add $pair = [string[]]::new($size) just after $returns.Add($pair)
I'm reading a tutorial and learned that PowerShell supports ordered hashes. When would I use that feature?
Sample code of what I'm talking about:
$hash = [ordered]#{ ID = 1; Shape = "Square"; Color = "Blue"}
Let me complement Maximilian Burszley's helpful answer with a broader perspective:
tl;dr
Most of the time you want [ordered] #{ ... } ([System.Collections.Specialized.OrderedDictionary]) (PSv3+):
It provides enumeration of the entries in the order in which they were defined (also reflected in the .Keys and .Values collection properties).
It also allows accessing entries by index, like an array.
Typically, you can use [ordered] #{ ... } interchangeably with #{ ... }, the regular hashtable, [hashtable] a.k.a [System.Collections.Hashtable], because both types implement the [IDictionary] interface, which is how parameters that accept hash tables are typically typed.
The performance penalty you pay for using [ordered] is negligible.
Some background:
For technical reasons, the most efficient implementation of a hashtable (hash table) is to let the ordering of entries be the outcome of implementation details, without guaranteeing any particular order to the caller.
This is fine for use cases where all you do is to perform isolated lookups by key, where the ordering among keys (entries) is irrelevant.
However, often you do care about the ordering of entries:
in the simplest case, for display purposes; there is something disconcerting about seeing the definition order jumbled; e.g.:
#{ one = 1; two = 2; three = 3 }
Name Value
---- -----
one 1
three 3 # !!
two 2
more importantly, the enumeration of entries may need to be predictable for further programmatic processing; e.g. (note: strictly speaking, property order doesn't matter in JSON, but it is again important for the human observer):
# No guaranteed property order.
PS> #{ one = 1; two = 2; three = 3 } | ConvertTo-Json
{
"one": 1,
"three": 3, # !!
"two": 2
}
# Guaranteed property order.
PS> [ordered] #{ one = 1; two = 2; three = 3 } | ConvertTo-Json
{
"one": 1,
"two": 2,
"three": 3
}
It's unfortunate that PowerShell's hashtable-literal syntax, #{ ... }, doesn't default to [ordered][1], but it is too late to change that.
There is one context in which [ordered] is implied, however: if you cast a hashtable literal to [pscustomobject] in order to create a custom object:
[pscustomobject] #{ ... } is syntactic sugar for [pscustomobject] [ordered] #{ ... }; that is, the resulting custom object's properties are ordered based on the entry order in the hashtable literal; e.g.:
PS> [pscustomobject] #{ one = 1; two = 2; three = 3 }
one two three # order preserved!
--- --- -----
1 2 3
Note, however, that this only works exactly as shown above: if the cast applied directly to a hashtable literal; if you use a variable to store the hashtable in first or if you even just enclose the literal in (...) the ordering is lost:
PS> $ht = #{ one = 1; two = 2; three = 3 }; [pscustomobject] $ht
one three two # !! Order not preserved.
--- ----- ---
1 3 2
PS> [pscustomobject] (#{ one = 1; two = 2; three = 3 }) # Note the (...)
one three two # !! Order not preserved.
--- ----- ---
1 3 2
Therefore, if you construct a hashtable iteratively first and then cast it to [pscustomobject], you must start with an [ordered] hashtable to get predictable ordering of properties; this technique is useful, because it's easier to create hashtable entries than it is to add properties to a custom object; e.g.:
$oht = [ordered] #{} # Start with an empty *ordered* hashtable
# Add entries iteratively.
$i = 0
foreach ($name in 'one', 'two', 'three') {
$oht[$name] = ++$i
}
[pscustomobject] $oht # Convert the ordered hashtable to a custom object
Finally, note that [ordered] can only be applied to hashtable literal; you cannot use it to convert a preexisting regular hashtable to an ordered one (which wouldn't make any sense anyway, because you have no defined order to begin with):
PS> $ht = #{ one = 1; two = 2; three = 3 }; [ordered] $ht # !! Error
...
The ordered attribute can be specified only on a hash literal node.
...
On a side note: Neither ordered nor regular hashtables enumerate their entries when sent through the pipeline; they are sent as a whole.
To enumerate the entries, use the .GetEnumerator() method; e.g.:
#{ one = 1; two = 2; three = 3 }.GetEnumerator() | ForEach-Object { $_.Value }
1
3 # !!
2
As for the performance impact of using [ordered]:
As noted, it is negligible; here are some sample timings, averaged across 10,000 runs, using Time-Command:
Time-Command -Count 10,000 { $ht=#{one=1;two=2;three=3;four=4;five=5;six=6;seven=7;eight=8;nine=9}; foreach($k in $ht.Keys){$ht.$k} },
{ $ht=[ordered] #{one=1;two=2;three=3;four=4;five=5;six=6;seven=7;eight=8;nine=9}; foreach($k in $ht.Keys){$ht.$k} }
Sample timings (Windows PowerShell 5.1 on Windows 10, single-core VM):
Command TimeSpan Factor
------- -------- ------
$ht=#{one=1;two=2;th... 00:00:00.0000501 1.00
$ht=[ordered] #{one=... 00:00:00.0000527 1.05
That is, [ordered] amounted to a mere 5% slowdown.
[1] Maximilian Burszley points out one tricky aspect specific to [ordered] hashtables:
With numeric keys, distinguishing between a key and an index can become tricky; to force interpretation of a number as a key, cast it to [object] or use dot notation (., property-access syntax) instead of index syntax ([...]):
# Ordered hashtable with numeric keys.
PS> $oht = [ordered] #{ 1 = 'one'; 2 = 'two' }
PS> $oht[1] # interpreted as *index* -> 2nd entry
two
PS> $oht[[object] 1] # interpreted as *key* -> 1st entry.
one
PS> $oht.1 # dot notation - interpreted as *key* -> 1st entry.
one
That said, numeric keys aren't common, and to me the benefit of defaulting to predictable enumeration outweighs this minor problem.
The .NET type underlying [ordered], System.Collections.Specialized.OrderedDictionary, has been available since v1, so PowerShell could have chosen it as the default implementation for #{ ... } from the get-go, even in PowerShell v1.
Given PowerShell's commitment to backward compatibility, changing the default is no longer an option, however, as that could break existing code, namely in the following ways:
There may be existing code that checks untyped arguments for whether they're a hashtable with -is [hashtable], which would no longer work with an ordered hashtable (however, checking with -is [System.Collections.IDictionary] would work).
There may be existing code that relies on hashtables with numeric keys, in which case the index-syntax lookup behavior would change (see example above).
The reason for an ordered dictionary is for display / typecast purposes. For example, if you want to cast your hashtable to a PSCustomObject and you want your keys to be in the order you enter them, you use ordered.
The use case here is when you use Export-Csv, the headers are in the right order. This is just one example I could think of off the top of my head. By design, the hashtable type doesn't care about the order you enter keys/values and will be different each time you display it to the success stream.
An additional use-case for the ordered dictionary: you can treat your hashtable as an array and use numerical accessors to find items, such as $myOrderedHash[-1] will grab the last item added to the dictionary.
I have a "structured" file (logical fixed-length records) from a legacy program on a legacy (non-MS) operating system. I know how the records were structured in the original program, but the original O/S handled structured data as a sequence of bytes for file I/O, so a hex dump won't show you anything more than what the record length is (there are marker bytes and other record overhead imposed by the access method API used to generate the file originally).
Once I have the sequence of bytes in a Powershell variable, with the overhead bytes "cut away", how can I convert this into a structured object? Some of the "fields" are 16-bit integers, some are strings of the form [s]data (where [s] is a byte giving the length of the "real" data in that field), some are BCD coded fixed-point numbers, some are IEEE floats.
(I haven't been specific about the structure, either on the Powershell side or on the legacy side, because I am seeking a more-or-less 'generic' solution/technique, as I actually have several different files with different record structures to process.)
Initially, I tried to do it by creating a type that could take the buffer and overwrite a struct so that all the fields were nicely filled in. However, certain issues arose (regarding struct layout, fixed buffers and mixing fixed and managed members) and I also realised that there was no guarantee that the data in the buffer would be properly (or even legally) aligned. Decided to try a more programmatic path.
"Manual" parsing is out, so how about automatic parsing? You're going to need to define the members of your PSobject at some point, why not do it in a way that can help programmatically parse the data. This method does not require the data in the buffer to be correctly aligned or even contiguous. You can also have fields overlap to separate raw unions into the individual members (though, typically, only one will contain a "correct" value).
First step, build a hash table to identify the members, the offset in the buffer, their data types and, if an array, the number of elements :
$struct = #{
field1 = 0,[int],0; # 0 means not an array
field2 = 4,[byte],16; # a C string maybe
field3 = 24,[char],32; # wchar_t[32] ? note: skipped over bytes 20-23
field4 = 56,[double],0
}
# the names field1/2/3/4 are arbitrary, any valid member name may be used (but not
# necessarily any valid hash key if you want a PSObject as the end result).
# also, the values could be hash tables instead of arrays. that would allow
# descriptive names for the values but doesn't affect the end result.
Next, use [BitConverter] to extract the required data. The problem here is that we need to call the correct method for all the varying types. Just use a (big) switch statement. The basic principle is the same for most values, get the type indicator and initial offset from the $struct definition then call the correct [BitConverter] method and supply the buffer and initial offset, update the offset to where the next element of an array would be and then repeat for as many array elements as are required. The only trap here is that the data in the buffer must have the same format as expected by [BitConverter], so for the [double] example, the bytes in the buffer must conform to IEEE-754 floating point format (assuming that [BitConverter]::ToDouble() is used). Thus, for example, raw data from a Paradox database will need some tweeking because it flips the high bit to simplify sorting.
$struct.keys | foreach {
# key order is undefined but that won't affect the final object's members
$hashobject = #{}
} {
$fieldoffs = $struct[$_][0]
$fieldtype = $struct[$_][1]
if (($arraysize = $struct[$_][2]) -ne 0) { # yes, I'm a C programmer from way back
$array = #()
} else {
$array = $null
}
:w while ($arraysize-- -ge 0) {
switch($fieldtype) {
([int]) {
$value = [bitconverter]::toint32($buffer, $fieldoffs)
$fieldoffs += 4
}
([byte]) {
$value = $buffer[$fieldoffs++]
}
([char]) {
$value = [bitconverter]::tochar($buffer, $fieldoffs)
$fieldoffs += 2
}
([string]) { # ANSI string, 1 byte per character
$array = new-object string (,[char[]]$buffer[$fieldoffs..($fieldoffs+$arraysize)])
# $arraysize has already been decremented so don't need to subtract 1
break w # "array size" was actually string length so don't loop
#
# description:
# first, get a slice of the buffer as a byte[] (assume single byte characters)
# next, convert each byte to a char in a char[]
# then, invoke the constructor String(Char[])
# finally, put the String into $array ready for insertion into $hashobject
#
# Note the convoluted syntax - New-Object expects the second argument to be
# an array of the constructor parameters but String(Char[]) requires only
# one argument that is itself an array. By itself,
# [char[]]$buffer[$fieldoffs..($fieldoffs+$arraysize)]
# is treated by PowerShell as an argument list of individual chars, corrupting the
# constructor call. The normal trick is to prepend a single comma to create an array
# of one element which is itself an array
# ,[char[]]$buffer[$fieldoffs..($fieldoffs+$arraysize)]
# but this won't work because of the way PowerShell parses the command line. The
# space before the comma is ignored so that instead of getting 2 arguments (a string
# "String" and the array of an array of char), there is only one argument, an array
# of 2 elements ("String" and array of array of char) thereby totally confusing
# New-Object. To make it work you need to ALSO isolate the single element array into
# its own expression. Hence the parentheses
# (,[char[]]$buffer[$fieldoffs..($fieldoffs+$arraysize)])
#
}
}
if ($null -ne $array) {
# must be in this order* to stop the -ne from enumerating $array to compare against
# $null. this would result in the condition being considered false if $array were
# empty ( (#() -ne $null) -> $null -> $false ) or contained only one element with
# the value 0 ( (#(0) -ne $null) -> (scalar) 0 -> $false ).
$array += $value
# $array is not $null so must be an array to which $value is appended
} else {
# $array is $null only if $arraysize -eq 0 before the loop (and is now -1)
$array = $value
# so the loop won't repeat thus leaving this one scalar in $array
}
}
$hashobject[$_] = $array
}
#*could have reversed it as
# if ($array -eq $null) { scalar } else { collect array }
# since the condition will only be true if $array is actually $null or contains at
# least 2 $null elements (but no valid conversion will produce $null)
At this point there is a hash table, $hashobject, with keys equal to the field names and values containing the bytes from the buffer arranged into single (or arrays of) numeric (inc. char/boolean) values or (ANSI) strings. To create a (proper) object, just invoke New-Object -TypeName PSObject -Property $hashobject or use [PSCustomObject]$hashobject.
Of course, if the buffer actually contained structured data then the process would be more complicated but the basic procedure would be the same. Note also that the "types" used in the $struct hash table have no direct effect on the resultant types of the object members, they are only convenient selectors for the switch statement. It would work just as well with strings or numbers. In fact, the parentheses around the case labels are because switch parses them the same as command arguments. Without the parentheses, the labels would be treated as literal strings. With them, the labels are evaluated as a type object. Both the label and the switch value are then converted to strings (that's what switch does for values other than script blocks or $null) but each type has a distinct string representation so the case labels will still match up correctly. (Not really on point but still interesting, I think.)
Several optimisations are possible but increase the complexity slightly. E.g.
([byte]) { # already have a byte[] so why collect bytes one at a time
if ($arraysize -ge 0) { # was originally -gt 0 so want a byte[]
$array = [byte[]]$buffer[$fieldoffs..($fieldoffs+$arraysize)]
# slicing the byte array produces an object array (of bytes) so cast it back
} else { # $arraysize was 0 so just a single byte
$array = $buffer[$fieldoffs]
}
break w # $array ready for insertion into $hashobject, don't need to loop
}
But what if my strings are actually Unicode?, you say. Easy, just use existing methods from the [Text.Encoding] class,
[string] { # Unicode string, 2 (LE) bytes per character
$array = [text.encoding]::unicode.getstring([byte[]]$buffer[$fieldoffs..($fieldoffs+$arraysize*2+1)])
# $arraysize should be the string length so, initially, $arraysize*2 is the byte
# count and $arraysize*2-1 is the end index (relative to $fieldoffs) but $arraysize
# was decremented so the end index is now $arraysize*2+1, i.e. length*2-1 = (length-1)*2+1
break w # got $array, no loop
}
You could also have both ANSI and Unicode by utilising a different type indicator for the ANSI string, maybe [char[]]. Remember, the type indicators do not affect the result, they just have to be distinct (and hopefully meaningful) identifiers.
I realise that this is not quite the "just dump the bytes into a union or variant record" solution mentioned in the OPs comment but PowerShell is based in .NET and uses managed objects where this sort of thing is largely prohibited (or difficult to get working, as I found). For example, assuming you could just dump raw chars (not bytes) into a String, how would the Length property get updated? This method also allows some useful preprocessing such as splitting up unions as noted above or converting raw byte or char arrays into the Strings they represent.
I'm googling since a while, but I didn't find a solution to my problem.
I have to say I'm newbie in Powershell.
I would like to create the following array
$a = (A,B,C,D) where
A = 1 string (always)
B = 1 string (always)
C = undefined number of strings. I need to be able to add elements dynamically
D = undefined number of strings. I need to be able to add elements dynamically (same number as C)
Is this possible?
Example of 2 elements of the array
("WSTM0123456", "192.168.10.155",("WSTM8765421","WSTM9856454","WSTM1289765"),("192.36.36.36", "187.25.25.25","192.69.89.65"))
("WLDN1251254", "156.25.36.54", ("WLDN1234512", "WLDN9865323"), ("187.154.12.12","163.136.25.98"))
I don't know a priori how many elements will be in C and D and I'll have to append strings in position C and D with a for cycle.
Scope: group many strings (C & D) under the same string (A/B) which are in common.
Any help would be appreciated
Thanks,
Marco
You can do this, but it's probably quite painful as dealing with arrays is sometimes cumbersome in PowerShell due to lots of implicit flattening.
I'd suggest creating a custom type for this. Then you can also give the individual parts useful names (I don't know the purpose of what you're doing here, so I'm making up names here. Feel free to change):
$properties = #{
Name = 'WSTM0123456';
IP = [ipaddress]'192.168.10.155';
ListOfNames = #("WSTM8765421","WSTM9856454","WSTM1289765");
ListOfIPs = [ipaddress[]]#("192.36.36.36", "187.25.25.25","192.69.89.65")
}
$foo = New-Object PSObject -Property #properties
Then you can simply append new items like so:
$foo.ListOfNames += 'AnotherName'
I think this is pretty much the same idea. Use a hash table, and make two of the elements arrays. This is how you would create the arrays "on the fly" at runtime, without knowing what any of the contents were going to be in advance, taking $x and putting any item that starts with "t" in "C" , and everything else in "D":
$a = #{A = "Some string";B = "Some other string"}
$x = "one","two","three","four","five"
$x |% {
if ($_ -match "^t"){$a["C"] += #($_)}
else {$a["D"] += #($_)}
}
$a.a
Some string
$a.b
Some other string
$a.c
two
three
$a.d
one
four
five
$obj = new-object psobject -property $a