I'm parsing a database table exported into csv where there are embedded fields in what is essentially a memo field.
The database also contains version history, and the csv contains all versions.
Basic structure of the data is Index(sequential record number),Reference(specific foreign key), Sequence (order of records for a given reference), and Data (the memo field with the data to parse).
You could think of the "Data" field as text documents limited to 80 chars wide and 40 chars deep, and then sequenced in the order they would print. Every record entry is assigned an ascending index.
For reference, $myParser is a [Microsoft.VisualBasic.FileIO.TextFieldParser], so ReadFields() returns a row of fields as an array/list.
My ultimate question is, how can this be formatted to be more intuitive to the reader? Below code is powershell, i'd be interested in answers relating to C# also,as it's something of a language agnostic style problem, though i think get/set would trivialize this to some degree.
Consider the following code (an insert/update routine in a 2 deep nested dictionary/hash):
enum cmtField
{
Index = 0
Sequence = 1
Reference = 2
Data = 4
}
$myRecords = [System.Collections.Generic.Dictionary[int,System.Collections.Generic.Dictionary[int,string]]]::new() #this could be a hash table, but is more verbose this way
While($true) #there's actually control here, but this provides a simple loop assuming infinite data
{
$myFields = $myParser.ReadFields() #read a line from the csvfile and return an array/list of fields for that line
if(!$myRecords.ContainsKey($myFields[[cmtField]::Reference])) #if the reference of the current record is new
{
$myRecords.Add($myFields[[cmtField]::Reference],[System.Collections.Generic.Dictionary[int,CommentRecord]]::new()) #create tier 1 reference index
$myRecords[$myFields[[cmtField]::Reference]].add($myFields[[cmtField]::Sequence],$myFields[[cmtField]::Data]) #create tier 2 sequence reference and data
}
else #if the reference aklready exists in the dictionary
{
if(!$myRecords[$myFields[[cmtField]::Reference]].ContainsKey($myFields[[cmtField]::Sequence])) #if the sequence ID of the current record is new
{
$myRecords[$myFields[[cmtField]::Reference]].Add($myFields[[cmtField]::Sequence],$myFields[[cmtField]::Data]) #add record at [reference][sequence]
}
else #if the sequence already exists for this reference
{
if($myRecords[$myFields[[cmtField]::Reference]][$myFields[[cmtField]::Sequence]].Index -lt $myFields[[cmtField]::Index]) #if the index of the currently read field is higher than the store index, it must be newer
{
$myRecords[$myFields[[cmtField]::Reference]][$myFields[[cmtField]::Sequence]] = $myFields[[cmtField]::Data] #replace with new data
}
#else discard currently read data (do nothing
}
}
}
Frankly, trying to make this readable both makes my head hurt and my eyes bleed a little. It only gets messier and messier the deeper the dictionary goes. I'm stuck between the bracket soup and no self-documentation.
My ultimate question is, how can this be formatted to be more intuitive to the reader?
That... ultimately depends on who "the reader" is - is it your boss? Your colleagues? Me? Will you use this code sample to teach programming to someone?
In terms of making it less "messy", there are a couple of immediate steps you can take.
The first thing I would change to make your code more readable, would be to add a using namespace directive at the top of the file:
using namespace System.Collections.Generic
Now you can create nested dictionaries with:
[Dictionary[int,Dictionary[int,string]]]::new()
... as opposed to:
[System.Collections.Generic.Dictionary[int,System.Collections.Generic.Dictionary[int,string]]]::new()
The next thing I would reduce is repeated index access patterns like $myFields[[cmtField]::Reference] - you never modify $myFields after initial assignment at the top of the loop, so there's no need to delay resolution of it.
while($true)
{
$myFields = $myParser.ReadFields()
$Reference = $myFields[[cmtField]::Reference]
$Data = $myFields[[cmtField]::Data]
$Sequence = $myFields[[cmtField]::Sequence]
$Index = $myFields[[cmtField]::Index]
if(!$myRecords.ContainsKey($Reference)) #if the reference of the current record is new
{
$myRecords.Add($Reference,[Dictionary[int,CommentRecord]]::new()) #create tier 1 reference index
$myRecords[$Reference].Add($Sequence,$Data) #create tier 2 sequence reference and data
}
else
{
# ...
Finally, you can simplify the code vastly by abandoning nested if/else statements, and instead just break it down into a succession of steps that has to pass one by one, and you end up with something like this:
using namespace System.Collections.Generic
enum cmtField
{
Index = 0
Sequence = 1
Reference = 2
Data = 4
}
$myRecords = [Dictionary[int,Dictionary[int,CommentRecord]]]::new()
while($true)
{
$myFields = $myParser.ReadFields()
$Reference = $myFields[[cmtField]::Reference]
$Data = $myFields[[cmtField]::Data]
$Sequence = $myFields[[cmtField]::Sequence]
$Index = $myFields[[cmtField]::Index]
# Step 1 - ensure tier 1 dictionary is present
if(!$myRecords.ContainsKey($Reference))
{
$myRecords.Add($Reference,[Dictionary[int,CommentRecord]]::new())
}
# (now we only need to resolve `$myRecords[$Reference]` once)
$record = $myRecords[$Reference]
# step 2 - ensure sequence entry exists
if(!$record.ContainsKey($Sequence))
{
$record.Add($Sequence, $Data)
}
# step 3 - handle superceding comment records
if($record[$Sequence].Index -lt $Index)
{
$record[$Sequence] = $Data
}
}
I personally find this easier on the eyes (and mind) than the original if/else approach
Related
I'm not sure how to explain what I'm trying to do but maybe the code will help you understand.
I might get a few terms wrong, I'm still learning.
I was not unable to find any documentation that would help my case.
I feel like I'm complicating things and I could just use a lot of "if" statements but this seemed to be a way more efficient way (if it's possible) for what I'm trying to accomplish.
I am trying to invoke multiple Hash values with the "foreach" statement using the "read-host" inputs.
#Hash
$AppGroupsHash = #{
1 = "AppADGroup1"
2 = "AppADGroup2"
3 = "AppADGroup3"
4 = "AppADGroup4"
}
$AppGroups = $AppGroupsHash.Values -as [string[]]
#write-host for flair and choices to choose from.
Write-Host "`n1 = AppADGroup1`n2 = AppADGroup1`n3 = AppADGroup1`n4 = AppADGroup1`n " -ForegroundColor Yellow
#User input value(s) for the required task
#Not sure if [string] is necessary in front of $AppGroupsKeys
[string]$AppGroupsKeys = Read-Host "Enter App Hash values (1 - 2 - 3 - 4) (No space, delimit with ';') "
foreach ($AppGroupsKey in $AppGroupsKeys.split(';'))
{
if ($AppGroupsHash.Keys -ccontains $AppGroupsKey)
{
#To test if it will print the values I'm trying to invoke
echo $AppGroupsKey
#This is where I'm getting into trouble
$AppGroupsHash.($AppGroupsKey)
}
}
When I run it. This is what I get, Read-Host Values: 1;3;4 :
#These are the values being echoed but the "$AppGroupsHash.($AppGroupsKey)" does not work
1
3
4
What it should print out:
#Test "echo values"
1
3
4
#The "values" I want
AppADGroup1
AppADGroup3
AppADGroup4
I want to use the variables I've inputted as the Keys that are stored in the hash so that I can invoke those values, these will be eventually used to add users to those groups based on their requirements.
This is what I tried:
foreach ($AppGroupsKey in $AppGroupsKeys.split(';'))
{
if ($AppGroupsHash.Keys -ccontains $AppGroupsKey)
{
#To test if it will print the values I'm trying to invoke
echo $AppGroupsKey
#This is where I'm getting into trouble
write-host "$AppGroupsHash.$AppGroupsKey;"
}
}
Output:
1
System.Collections.Hashtable.1;
2
System.Collections.Hashtable.2;
4
System.Collections.Hashtable.4;
I feel like it's possible but I also feel like I'm missing some sort of secret syntax for this to work.
Is it possible what I want to accomplish or should I try Arrays or just a bunch of "if" statements?
I am new o scripting in powershell and am from a Python background. I want to know if I'm doing this right.
I created this array and want to extract each item one by one
$M365_E3_Grps = ("O365-CHN-DomainUser,O365-Vendor-Exchange-User")
ForEach ($Indiv_Grp in $M365_E3_Grps) {
ForEach ($Indiv_Grp in $M365_E3_Grps) {
`$ADGroup = $Indiv_Grp$ADGroup = $Indiv_Grp`
I want to know if we can extract vals with a for loop like this and assign it to a variable like this.
Construct of your array
Your array is not quite correct and will be populated as a string. To create a string array you will need to quote each item in comma separated list. The parentheses are also not required.
$M365_E3_Grps = "O365-CHN-DomainUser","O365-Vendor-Exchange-User"
Your foreach keyword syntax is however correct, even if the formatting in your question was slightly off.
foreach ($Indiv_Grp in $M365_E3_Grps) {
# Assigning $Indiv_Grp to $ADGroup here is kind of redundant since
# the value is already assinged to $Indiv_Grp
$Indiv_Grp
}
I have a "structured" file (logical fixed-length records) from a legacy program on a legacy (non-MS) operating system. I know how the records were structured in the original program, but the original O/S handled structured data as a sequence of bytes for file I/O, so a hex dump won't show you anything more than what the record length is (there are marker bytes and other record overhead imposed by the access method API used to generate the file originally).
Once I have the sequence of bytes in a Powershell variable, with the overhead bytes "cut away", how can I convert this into a structured object? Some of the "fields" are 16-bit integers, some are strings of the form [s]data (where [s] is a byte giving the length of the "real" data in that field), some are BCD coded fixed-point numbers, some are IEEE floats.
(I haven't been specific about the structure, either on the Powershell side or on the legacy side, because I am seeking a more-or-less 'generic' solution/technique, as I actually have several different files with different record structures to process.)
Initially, I tried to do it by creating a type that could take the buffer and overwrite a struct so that all the fields were nicely filled in. However, certain issues arose (regarding struct layout, fixed buffers and mixing fixed and managed members) and I also realised that there was no guarantee that the data in the buffer would be properly (or even legally) aligned. Decided to try a more programmatic path.
"Manual" parsing is out, so how about automatic parsing? You're going to need to define the members of your PSobject at some point, why not do it in a way that can help programmatically parse the data. This method does not require the data in the buffer to be correctly aligned or even contiguous. You can also have fields overlap to separate raw unions into the individual members (though, typically, only one will contain a "correct" value).
First step, build a hash table to identify the members, the offset in the buffer, their data types and, if an array, the number of elements :
$struct = #{
field1 = 0,[int],0; # 0 means not an array
field2 = 4,[byte],16; # a C string maybe
field3 = 24,[char],32; # wchar_t[32] ? note: skipped over bytes 20-23
field4 = 56,[double],0
}
# the names field1/2/3/4 are arbitrary, any valid member name may be used (but not
# necessarily any valid hash key if you want a PSObject as the end result).
# also, the values could be hash tables instead of arrays. that would allow
# descriptive names for the values but doesn't affect the end result.
Next, use [BitConverter] to extract the required data. The problem here is that we need to call the correct method for all the varying types. Just use a (big) switch statement. The basic principle is the same for most values, get the type indicator and initial offset from the $struct definition then call the correct [BitConverter] method and supply the buffer and initial offset, update the offset to where the next element of an array would be and then repeat for as many array elements as are required. The only trap here is that the data in the buffer must have the same format as expected by [BitConverter], so for the [double] example, the bytes in the buffer must conform to IEEE-754 floating point format (assuming that [BitConverter]::ToDouble() is used). Thus, for example, raw data from a Paradox database will need some tweeking because it flips the high bit to simplify sorting.
$struct.keys | foreach {
# key order is undefined but that won't affect the final object's members
$hashobject = #{}
} {
$fieldoffs = $struct[$_][0]
$fieldtype = $struct[$_][1]
if (($arraysize = $struct[$_][2]) -ne 0) { # yes, I'm a C programmer from way back
$array = #()
} else {
$array = $null
}
:w while ($arraysize-- -ge 0) {
switch($fieldtype) {
([int]) {
$value = [bitconverter]::toint32($buffer, $fieldoffs)
$fieldoffs += 4
}
([byte]) {
$value = $buffer[$fieldoffs++]
}
([char]) {
$value = [bitconverter]::tochar($buffer, $fieldoffs)
$fieldoffs += 2
}
([string]) { # ANSI string, 1 byte per character
$array = new-object string (,[char[]]$buffer[$fieldoffs..($fieldoffs+$arraysize)])
# $arraysize has already been decremented so don't need to subtract 1
break w # "array size" was actually string length so don't loop
#
# description:
# first, get a slice of the buffer as a byte[] (assume single byte characters)
# next, convert each byte to a char in a char[]
# then, invoke the constructor String(Char[])
# finally, put the String into $array ready for insertion into $hashobject
#
# Note the convoluted syntax - New-Object expects the second argument to be
# an array of the constructor parameters but String(Char[]) requires only
# one argument that is itself an array. By itself,
# [char[]]$buffer[$fieldoffs..($fieldoffs+$arraysize)]
# is treated by PowerShell as an argument list of individual chars, corrupting the
# constructor call. The normal trick is to prepend a single comma to create an array
# of one element which is itself an array
# ,[char[]]$buffer[$fieldoffs..($fieldoffs+$arraysize)]
# but this won't work because of the way PowerShell parses the command line. The
# space before the comma is ignored so that instead of getting 2 arguments (a string
# "String" and the array of an array of char), there is only one argument, an array
# of 2 elements ("String" and array of array of char) thereby totally confusing
# New-Object. To make it work you need to ALSO isolate the single element array into
# its own expression. Hence the parentheses
# (,[char[]]$buffer[$fieldoffs..($fieldoffs+$arraysize)])
#
}
}
if ($null -ne $array) {
# must be in this order* to stop the -ne from enumerating $array to compare against
# $null. this would result in the condition being considered false if $array were
# empty ( (#() -ne $null) -> $null -> $false ) or contained only one element with
# the value 0 ( (#(0) -ne $null) -> (scalar) 0 -> $false ).
$array += $value
# $array is not $null so must be an array to which $value is appended
} else {
# $array is $null only if $arraysize -eq 0 before the loop (and is now -1)
$array = $value
# so the loop won't repeat thus leaving this one scalar in $array
}
}
$hashobject[$_] = $array
}
#*could have reversed it as
# if ($array -eq $null) { scalar } else { collect array }
# since the condition will only be true if $array is actually $null or contains at
# least 2 $null elements (but no valid conversion will produce $null)
At this point there is a hash table, $hashobject, with keys equal to the field names and values containing the bytes from the buffer arranged into single (or arrays of) numeric (inc. char/boolean) values or (ANSI) strings. To create a (proper) object, just invoke New-Object -TypeName PSObject -Property $hashobject or use [PSCustomObject]$hashobject.
Of course, if the buffer actually contained structured data then the process would be more complicated but the basic procedure would be the same. Note also that the "types" used in the $struct hash table have no direct effect on the resultant types of the object members, they are only convenient selectors for the switch statement. It would work just as well with strings or numbers. In fact, the parentheses around the case labels are because switch parses them the same as command arguments. Without the parentheses, the labels would be treated as literal strings. With them, the labels are evaluated as a type object. Both the label and the switch value are then converted to strings (that's what switch does for values other than script blocks or $null) but each type has a distinct string representation so the case labels will still match up correctly. (Not really on point but still interesting, I think.)
Several optimisations are possible but increase the complexity slightly. E.g.
([byte]) { # already have a byte[] so why collect bytes one at a time
if ($arraysize -ge 0) { # was originally -gt 0 so want a byte[]
$array = [byte[]]$buffer[$fieldoffs..($fieldoffs+$arraysize)]
# slicing the byte array produces an object array (of bytes) so cast it back
} else { # $arraysize was 0 so just a single byte
$array = $buffer[$fieldoffs]
}
break w # $array ready for insertion into $hashobject, don't need to loop
}
But what if my strings are actually Unicode?, you say. Easy, just use existing methods from the [Text.Encoding] class,
[string] { # Unicode string, 2 (LE) bytes per character
$array = [text.encoding]::unicode.getstring([byte[]]$buffer[$fieldoffs..($fieldoffs+$arraysize*2+1)])
# $arraysize should be the string length so, initially, $arraysize*2 is the byte
# count and $arraysize*2-1 is the end index (relative to $fieldoffs) but $arraysize
# was decremented so the end index is now $arraysize*2+1, i.e. length*2-1 = (length-1)*2+1
break w # got $array, no loop
}
You could also have both ANSI and Unicode by utilising a different type indicator for the ANSI string, maybe [char[]]. Remember, the type indicators do not affect the result, they just have to be distinct (and hopefully meaningful) identifiers.
I realise that this is not quite the "just dump the bytes into a union or variant record" solution mentioned in the OPs comment but PowerShell is based in .NET and uses managed objects where this sort of thing is largely prohibited (or difficult to get working, as I found). For example, assuming you could just dump raw chars (not bytes) into a String, how would the Length property get updated? This method also allows some useful preprocessing such as splitting up unions as noted above or converting raw byte or char arrays into the Strings they represent.
I am having a little bit of trouble with hashtables/dictionaries in powershell. The most recent roadblock is the ability to find the index of a key in an ordered dictionary.
I am looking for a solution that isn't simply iterating through the object.
(I already know how to do that)
Consider the following example:
$dictionary = [Ordered]#{
'a' = 'blue';
'b'='green';
'c'='red'
}
If this were a normal array I'd be able to look up the index of an entry by using IndexOf().
[array]::IndexOf($dictionary,'c').
That would return 2 under normal circumstances.
If I try that with an ordered dictionary, though, I get -1.
Any solutions?
Edit:
In case anyone reading over this is wondering what I'm talking about. What I was trying to use this for was to create an object to normalize property entries in a way that also has a numerical order.
I was trying to use this for the status of a process, for example:
$_processState = [Ordered]#{
'error' = 'error'
'none' = 'none'
'started' = 'started'
'paused' = 'paused'
'cleanup' = 'cleanup'
'complete' = 'complete'
}
If you were able to easily do this, the above object would give $_processState.error an index value of 0 and ascend through each entry, finally giving $_processState.complete an index value of 5. Then if you compared two properties, by "index value", you could see which one is further along by simple operators. For instance:
$thisObject.Status = $_processState.complete
If ($thisObject.Status -ge $_processState.cleanup) {Write-Host 'All done!'}
PS > All done!
^^that doesn't work as is, but that's the idea. It's what I was aiming for. Or maybe to find something like $_processState.complete.IndexNumber()
Having an object like this also lets you assign values by the index name, itself, while standardizing the options...
$thisObject.Status = $_processState.paused
$thisObject.Status
PS > paused
Not really sure this was the best approach at the time or if it still is the best approach with all the custom class options there are available in PS v5.
It can be simpler
It may not be any more efficient than the answer from Frode F., but perhaps more concise (inline) would be simply putting the hash table's keys collection in a sub expression ($()) then calling indexOf on the result.
For your hash table...
Your particular expression would be simply:
$($dictionary.keys).indexOf('c')
...which gives the value 2 as you expected. This also works just as well on a regular hashtable... unless the hashtable is modified in pretty much any way, of course... so it's probably not very useful in that case.
In other words
Using this hash table (which also shows many of the ways to encode 4...):
$hashtable = [ordered]#{
sample = 'hash table'
0 = 'hello'
1 = 'goodbye'
[char]'4' = 'the ansi character 4 (code 52)'
[char]4 = 'the ansi character code 4'
[int]4 = 'the integer 4'
'4' = 'a string containing only the character 4'
5 = "nothing of importance"
}
would yield the following expression/results pairs:
# Expression Result
#------------------------------------- -------------
$($hashtable.keys).indexof('5') -1
$($hashtable.keys).indexof(5) 7
$($hashtable.keys).indexof('4') 6
$($hashtable.keys).indexof([char]4) 4
$($hashtable.keys).indexof([int]4) 5
$($hashtable.keys).indexof([char]'4') 3
$($hashtable.keys).indexof([int][char]'4') -1
$($hashtable.keys).indexof('sample') 0
by the way:
[int][char]'4' equals [int]52
[char]'4' has a "value" (magnitude?) of 52, but is a character, so it's used as such
...gotta love the typing system, which, while flexible, can get really really bad at times, if you're not careful.
Dictionaries uses keys and not indexes. OrderedDictionary combines a hashtable and ArrayList to give you order/index-support in a dictionary, however it's still a dictionary (key-based) collection.
If you need to get the index of an object in a OrderedDictionary (or a hasthable) you need to use foreach-loop and a counter. Example (should be created as a function):
$hashTable = [Ordered]#{
'a' = 'blue';
'b'='green';
'c'='red'
}
$i = 0
foreach($key in $hashTable.Keys) {
if($key -eq "c") { $i; break }
else { $i++ }
}
That's how it works internaly too. You can verify this by reading the source code for OrderedDictionary's IndexOfKey method in .NET Reference Source
For the initial problem I was attempting to solve, a comparable process state, you can now use Enumerations starting with PowerShell v5.
You use the Enum keyword, set the Enumerators by name, and give them an integer value. The value can be anything, but I'm using ascending values starting with 0 in this example:
Enum _ProcessState{
Error = 0
None = 1
Started = 2
Paused = 3
Cleanup = 4
Complete = 5
Verified = 6
}
#the leading _ for the Enum is just cosmetic & not required
Once you've created the Enum, you can assign it to variables. The contents of the variable will return the text name of the Enum, and you can compare them as if they were integers.
$Item1_State = [_ProcessState]::Started
$Item2_State = [_ProcessState]::Cleanup
#return state of second variable
$Item2_state
#comparison
$Item1_State -gt $Item2_State
Will return:
Cleanup
False
If you wanted to compare and return the highest:
#sort the two objects, then return the first result (should return the item with the largest enum int)
$results = ($Item1_State,$Item2_State | Sort-Object -Descending)
$results[0]
Fun fact, you can also use arithmetic on them, for example:
$Item1_State + 1
$Item1_State + $Item2_State
Will return:
Paused
Verified
More info on Enum here:
https://blogs.technet.microsoft.com/heyscriptingguy/2015/08/26/new-powershell-5-feature-enumerations/
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_enum?view=powershell-6
https://psdevopsug.scot/post/working-with-enums-in-powershell/
I was hoping that someone might be able to assist me. I'm new to Perl and generally getting some good results from some small scripts I've written, however, I'm stuck on a nested while loop of a new script I'm working on.
The script I've put together performs two mysql select statements, and then places the results into to separate arrays. I then want to check from the first element in the first array against all of the results in the second array. Then move to the second element in the first array and check for against all results in the seconds array and so on.
The goal of the script is to find an IP address in the first array and see which subnets it fits into in the second...
What I find is happening is that the script runs through on only the first element on the first array and all elements on the second array, then stops.
Here is the extract of the perl script below - if anyone could point me int the right direction I would really appreciate it.
my #ip_core_wan_field;
while ( #ip_core_wan_field = $wan_core_collection->fetchrow_array() ) {
my $coreipAddr = #ip_core_wan_field[1];
my #ip_wan_field;
while ( #ip_wan_field = $wan_collection->fetchrow_array() ) {
my $ipAddr = #ip_wan_field[1];
my $network = NetAddr::IP->new( #ip_wan_field[4], #ip_wan_field[5] );
my $ip = NetAddr::IP->new($coreipAddr);
if ( $ip->within($network) && $ip ne $ipAddr ) {
print "$ip IS IN THE SAME subnet as $network \n";
}
else {
print "$coreipAddr is outside the subnet for $network\n\n";
}
}
}
Your sql queries are single pass operations. If you want to loop over the second collection more than once, you need to either cache the values and interate over the cache, or rerun the query.
I would of course advise that you go with the first option using fetchall_arrayref
my $wan_arrayref = $wan_collection->fetchall_arrayref;
while ( my #ip_core_wan_field = $wan_core_collection->fetchrow_array() ) {
my $coreipAddr = #ip_core_wan_field[1];
for my $ip_wan_field_ref (#$wan_arrayref) {
my #ip_wan_field = #$ip_wan_field_ref;
There are of course other ways to make this operation more efficient, but that's the crux of your current problem.