SelectNodes in XPathNodeList - powershell

Given an XPathNodeList derived with something like this
$settingsNodes = $temp.xml.SelectNodes('/Settings/*')
And given a particular node Name or ID, I can iterate the XPathNodeList, test for a match and increment a counter. However, I wonder if there is something more elegant, along the lines of
$count = $settingsNodes.SelectNodes("//$nodeName")
That doesn't work because an XPathNodeList doesn't have a SelectNodes method, and I can't seem to find anything that does work. Googling XPathNodeList SelectNodes returns all sorts of references to SelectNodes as a method of XmlNode, but nothing on XPathNodeList.
My specific condition means I am almost certainly never looping through more than a few hundred to perhaps a thousand nodes, so maybe it really doesn't matter, it just seems like there is probably or more graceful solution and I just haven't found it.
EDIT: For additional context.
In one condition I might have this XML and I just want to catch and log the duplicate UserLogFilePath node.
<Settings>
<JobsXML>[Px~Folder]\Resources\jobs.xml</JobsXML>
<JobLogFilePath>[Px~Folder]\Logs</JobLogFilePath>
<JobLogFileName>[Px~StartTime] Job [Px~Job][Px~Error]</JobLogFileName>
<MachineLogFilePath>[Px~Folder]\Logs</MachineLogFilePath>
<MachineLogFileName>[Px~Now] [Px~Action] [Px~Set] on [Px~Computer][Px~Error][Px~Validation]</MachineLogFileName>
<ResetLogFilePath>[Px~Folder]\Logs</ResetLogFilePath>
<ResetLogFileName>[Px~Now] [Px~Action] on [Px~Computer][Px~Error][Px~Validation]</ResetLogFileName>
<UserLogFilePath>[Px~Folder]\Logs</UserLogFilePath>
<UserLogFilePath>C:\Program Files</UserLogFilePath>
<UserLogFileName>[Px~User] on [Px~Computer]</UserLogFileName>
<UserContextMember></UserContextMember>
<UserContextNotMember>Administrators</UserContextNotMember>
</Settings>
If there are no duplicates the entire Settings node is imported into my XML variable in memory.
Later, I might have this XML and I want to catch duplicate IDs, both in and outside of any Product_Group.
<Definitions>
<Products>
<Product_Group id="Miscellaneous">
<Product id="ADSK360">
<DefaultShortcut>Autodesk 360.lnk</DefaultShortcut>
<ProgramDataFolder>C:\ProgramData\Autodesk\Autodesk ReCap</ProgramDataFolder>
<ProgramFolder>C:\Program Files\Autodesk\Autodesk Sync</ProgramFolder>
<ShortcutPath>C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Autodesk</ShortcutPath>
</Product>
<Product id="ADSK360">
<DefaultShortcut>Autodesk 360.lnk</DefaultShortcut>
<ProgramDataFolder>C:\ProgramData\Autodesk\Autodesk ReCap</ProgramDataFolder>
<ProgramFolder>C:\Program Files\Autodesk\Autodesk Sync</ProgramFolder>
<ShortcutPath>C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Autodesk</ShortcutPath>
</Product>
</Product_Group>
<Product id="ADSK360">
<DefaultShortcut>Autodesk 360.lnk</DefaultShortcut>
<ProgramDataFolder>C:\ProgramData\Autodesk\Autodesk ReCap</ProgramDataFolder>
<ProgramFolder>C:\Program Files\Autodesk\Autodesk Sync</ProgramFolder>
<ShortcutPath>C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Autodesk</ShortcutPath>
</Product>
</Products>
</Definitions>
If there are no duplicates the Product_Groups are eliminated (it's just a convenience for users managing their XML) and all the Products are imported into the Products node in memory.
Subsequent product files are checked first for internal duplicates, and if none are found checked for duplicates already in the main XML, and if none found the file's XML is merged. This repeats for potentially a hundred or more total files in some cases.
The current loop based approach is inelegant, especially for the internal duplicates test. I make an XPathNodesList, say of all the Products, in the candidate XML. This is where SelectNodes is nice because it can find Products in a Product_Group or not. Then I loop through each product and test it against the whole XPathNodesList. Ugly because a list of only 10 nodes means 100 times through the loop. When testing the candidate against the final XML it's more efficient, but still ugly.
EDIT #2:
Taking a stab at using Select, I have this correctly finding duplicate nodes.
$settingsNodes = $temp.xml.SelectNodes('/Settings/*')
Write-Host "$($settingsNodes.count)"
Write-Host "$(($settingsNodes | select -unique).count)"
But how does one find only duplicate IDs? Better yet duplicate IDs with the same node name? Since a different node name but the same ID would actually not be a duplicate. -unique is a bool I see, so my guess is I am about to learn something more about pipelining, because I need to extract the IDs to pipe to Select -unique, which isn't Where-Object. How does one just pull the IDs in this case?
OK, gut test here. I think maybe I am trying to be a bit too clever when in fact there is a much easier solution. To wit...
$settingsNodes = $temp.xml.SelectNodes('/Settings/*')
$unique = $duplicate = #()
foreach ($node in $settingsNodes) {
if ($unique -contains $node.Name) {
$duplicate += $node.Name
} else {
$unique += $node.Name
}
}
Write-Host "u: $unique"
Write-Host " "
Write-Host "d: $duplicate"
Swap Name for ID and it works for that too. The SelectNodes will take care of eliminating the Product_Group. And similarly I can build an array of the nodes in Final to test against with the now unique list of candidate nodes.
So, am I missing something that is going to bite me? Or should I just go ahead and kick myself? ;)

Related

Powershell circular dependencies

I have a scenario where I THINK a circular dependency is the right answer. In my actual code I have a class that contains some data that is used for token replacements. That class contains other classes for the data, and contains a method that does the lookup of token values and returns the value. However, those dependent classes need to be validated, and some of those values are dependent on lookups. So, I have mocked up this code to validate the approach.
class Tokens {
[Collections.Specialized.OrderedDictionary]$primaryData
[Collections.Specialized.OrderedDictionary]$secondaryData
Tokens ([Secondary]$seconday) {
$this.primaryData = [Ordered]#{
'1' = 'One'
'2' = 'Two'
'3' = 'Three'
'4' = 'Four'
'5' = 'Five'
}
$this.secondaryData = $seconday.secondaryData
}
[String] GetToken ([String]$library, [String]$item) {
return $this.$library.$item
}
[Void] SetToken ([String]$library, [String]$item, [String]$value) {
$this.$library.$item = $value
}
[String] ToString () {
[Collections.Generic.List[String]]$toString = [Collections.Generic.List[String]]::new()
foreach ($key in $this.primaryData.Keys) {
$toString.Add("$key : $($this.primaryData.$key)")
}
foreach ($key in $this.secondaryData.Keys) {
$toString.Add("$key : $($this.secondaryData.$key)")
}
return [string]::Join("`n", $toString)
}
}
class Secondary {
[Collections.Specialized.OrderedDictionary]$secondaryData
Secondary () {
$this.secondaryData = [Ordered]#{
'A' = 'a'
'B' = 'b'
'C' = 'c'
'D' = 'd'
'E' = 'e'
}
}
[Void] Update ([Tokens]$tokensReference) {
$tokensReference.SetToken('secondaryData', 'A', 'A')
$tokensReference.SetToken('secondaryData', 'B', "$($tokensReference.GetToken('secondaryData', 'A')) and $($tokensReference.GetToken('secondaryData', 'B'))")
}
[String] ToString () {
[Collections.Generic.List[String]]$toString = [Collections.Generic.List[String]]::new()
foreach ($key in $this.secondaryData.Keys) {
$toString.Add("$key : $($this.secondaryData.$key)")
}
return [string]::Join("`n", $toString)
}
}
CLS
$secondary = [Secondary]::new()
$tokens = [Tokens]::new($secondary)
$secondary.Update($tokens)
Write-Host "$($tokens.ToString())"
This is working, exactly as expected. however, the idea of circular dependency injection has my hair standing on end. Like, it could be a real problem, or at least is a code smell. So, my question is, am I on the right track, or is this a dead end and I just haven't found that end yet? Given that PowerShell isn't "fully" object oriented yet, I imagine there could be some uniquely PS related issues, and everything I have found searching for Powershell Circular Dependency talks about removing them. So far I haven't found anything about when it is appropriate and how to do it well.
And, assuming it is a valid approach, is there anything obvious in this simplified implementation that could lead to problems or could be better done some other way?
EDIT: OK, so perhaps I am going to refine my vocabulary a bit too. I was thinking circular dependency since Secondary is a dependency, or member perhaps, of Tokens, and then I update Secondary from inside Secondary, while referencing data from Secondary, via the method in Tokens.
To clarify (I hope) the ultimate goal, these lookups are for program specific data, which I have in XML files. So, for example the data file for Autodesk Revit 2021 would include these three items
<item id="GUID" type="literal">{7346B4A0-2100-0510-0000-705C0D862004}</item>
<item id="installDataKey" type="registryKeyPath">HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\[Revit 2021~GUID]</item>
<item id="displayName" type="registryPropertyValue" path="installDataKey">Revit 2021</item>
In actual use I would want to get the DisplayName property found in the key defined in <item id="installDataKey">, and if the value matches the value in <item id="displayName"> then I might also look for the value of the DisplayVersion property in the same key, and make decisions based on that. But because there are new versions every year, and 20+ different software packages to address, managing these data files is a pain. So I want to validate the data I have in the files against a machine that actually has the software installed, to be sure I have my data correct. Autodesk is famous for changing things for no good reason, and often for some very customer hostile reasons. So, things like referencing the GUID as data and reusing it as a token, i.e. the [Revit 2021~GUID] above, saves effort. So during the validation process only, I would want to set the GUID, then do the standard token replacement to convert HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\[Revit 2021~GUID] to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{7346B4A0-2100-0510-0000-705C0D862004} and should that key actually be found, use it to validate about 20 other registry and file paths as well as the actual value of DisplayName. If everything validates I will sign the XML, and in actual use signed XML will basically treat everything as a literal and no validation is done, or rather it was prevalidated.
But before use a reference to [product~installDataKey] when the current product is Revit 2021 would resolve first to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\[Revit 2021~GUID] then to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{7346B4A0-2100-0510-0000-705C0D862004} at which point the code could use it as a registry path and see if Revit 2021 is in fact installed.
So, 99.9% of the time, I would instantiate Tokens with a constructor that just includes the full dataset and move on. But on the .1% of occasions where I am validating the data itself, I need to be able to read the xml, set the value for that GUID and immediately use the lookup to validate that Autodesk hasn't done something stupid like move some data out of the GUID key and into a secondary key. They have done that before.
I also might want to replace HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall with a token like [windows~uninstallKey] just to make life easier there too.
Hopefully that makes some sense. It's a mess to be sure, but anything to do with Autodesk is a mess.

O365 Powershell | Breaking up a long list into two sets of 100

I am looking to create a rule in Office 365 applied to all of the members in our org.
I would like this rule to append a warning on all incoming email from outside the organization with the same Display Names as our users.
When I attempt to apply it to all of the users in our org I get an error stating that the rule is too long.
In order to solve that I pulled a group, but I am still about 1000 characters over the limit.
I would like to make two variables, that each hold one half of the list, created by this command:
(Get-DistibutionGroupMember -Identity email#contoso.com -ResultSize Unlimited).DisplayName
I have attempted to modify the ResultSize parameter, but what I would need is result 1-100 and then 100-200 from the same list.
Another caveat to this problem is that the list cannot be static. It is something that the script will have to update every time it is run.
There is a sub-string command that you can use on a particular username that I have utilized when I made something for AD, but I am not aware of any way to break up a list like this.
If anyone has any other ways to solve this issue I would be more than open to any suggestion.
Thanks for taking the time to read this!
There are many ways of doing it. I found it very readable.
My favorite one is this one:
$ObjectList = 1..1000
$Step = 100
$counter = [pscustomobject] #{ Value = 0 }
$ObjectListSplitted = $ObjectList | Group-Object -Property { math]::Floor($counter.Value++ / $step) }
Then if you want to show the third subset just use this format :
$ObjectListSplitted[3].Group
Have a look to this solution already explained.
As a note other languages are capable of slicing an array of object with a start, stop and a step, have a look here if you're curious.

Arguments passed byReference and/or mutable constant issue

Not sure how I can do a minimal working code example here, but I will at least try to explain what is going on.
I have a utility that processes some text files to extract data, and then provides that data back in various ways depend on commend line options and some data in an XML file. There is a variety of "Processes" I could look for in the data, and I can output the final data in one or more formats (txt, csv, xml), with potentially different "Processes" being output to different file types. And since I am processing potentially hundreds of txt files, I want to multi-three this. So I have created a function, and since the mix of processes to monitor and output types to emit is the same for every txt file, which I then want to compile into a single big data structure at the end, I have created that final data structure as a hash of hashes, but empty of data. Then I use
Set-Variable resultsContainer -option:constant -value:$results
to make a constant, which is empty of data, to hand off to the function. the idea being I can hand the function an empty container plus the path of a txt file, and it can fill the container and return it, where I can in theory use it to fill the mutable data container. So I do a foreach on the txt files, and pass what SHOULD be an empty container each time, like this.
$journalResults = Invoke-PxParseJournal -journal "$source\$journal" -container $resultsContainer
However, instead of staying empty, as I would expect for a constant, I am effectively passing all the previous journals' data to each successive iteration of the function. I proved this to myself by initializing a counter to 1 and then running this loop after the Invoke-PxParseJournal call.
foreach ($process in $resultsContainer.Keys) {
foreach ($output in $resultsContainer.$process.Keys) {
foreach ($item in $resultsContainer.$process.$output) {
Write-Host "$(Split-Path $journal -leaf) $process $output $item$('!' * $count)"
}
}
}
After the first Invoke the loop produces nothing, but from there everything is appended. So I see results like this
journal.0002.txt Open_Project csv Gordon,1/20/2017 12:08:43 AM,Open an existing project,.\RFO_Benchmark - previous.rvt,0:00:22.012!!
journal.0003.txt Open_Project csv Gordon,1/20/2017 12:08:43 AM,Open an existing project,.\RFO_Benchmark - previous.rvt,0:00:22.012!!!
journal.0004.txt Open_Project csv Gordon,1/20/2017 12:08:43 AM,Open an existing project,.\RFO_Benchmark - previous.rvt,0:00:22.012!!!!
Identical repeats each time. Even odder, if I rerun the script in the console I WILL get an error saying
Set-Variable : Cannot overwrite variable resultsContainer because it is read-only or constant.
But still the results are data being appended. Now, my first thought was that because I was using the same variable name in the function as in the root script I was dealing with some scoping problem, so I changed the name of the variable in the function and gave it an alias, like this.
[Alias('container')]$parseJournal
I then populate and return the $parseJournal variable.
No change in behavior. Which now has me wondering if I just don't understand how parameters are passed. I had thought it was ByVal, but this is acting like it is ByReference, so even with the name change I am in fact just adding to the same data structure in memory each time.
Is there something obvious here that I am missing? FWIW< I am in PS 2.0 at the moment. I don't have a Win10 VM I can spin up easily at the moment to test there.

How can you filter on a custom value created during dehydration?

During dehydration I create a custom value:
def dehydrate(self, bundle):
bundle.data['custom_field'] = ["add lots of stuff and return an int"]
return bundle
that I would like to filter on.
/?format=json&custom_field__gt=0...
however I get an error that the "[custom_field] field has no 'attribute' for searching with."
Maybe I'm misunderstanding custom filters, but in both build_filters and apply_filters I can't seem to get access to my custom field to filter on it. On the examples I've seen, it seems like I'd have to redo all the work done in dehydrate in build_filters, e.g.
for all the items:
item['custom_field'] = ["add lots of stuff and return an int"]
filter on item and add to pk_list
orm_filters["pk__in"] = [i.pk for i in pk_list]
which seems wrong, as I'm doing the work twice. What am I missing?
The problem is that dehydration is "per object" by design, while filters are per object_list. That's why you will have to filter it manually and redo work in dehydration.
You can imagine it like this:
# Whole table
[obj, obj1, obj2, obj3, obj4, obj5, obj5]
# filter operations
[...]
# After filtering
[obj1, obj3, obj6]
# Returning
[dehydrate(obj), dehydrate(obj3), dehydrate(obj5)]
In addition you can imagine if you fetch by filtering and you get let say 100 objects. It would be quite inefficient to trigger dehydrate on whole table for instance 100000 records.
And maybe creating new column in model could be candidate solution if you plan to use a lot of filters, ordering etc. I guess its kind of statistic information in this field so if not new column then maybe django aggregation could ease your pain a little.

Creating a sort of "composable" parser for log files

I've started a little pet project to parse log files for Team Fortress 2. The log files have an event on each line, such as the following:
L 10/23/2009 - 21:03:43: "Mmm... Cycles!<67><STEAM_0:1:4779289><Red>" killed "monkey<77><STEAM_0:0:20001959><Blue>" with "sniperrifle" (customkill "headshot") (attacker_position "1848 813 94") (victim_position "1483 358 221")
Notice there are some common parts of the syntax for log files. Names, for example consist of four parts: the name, an ID, a Steam ID, and the team of the player at the time. Rather than rewriting this type of regular expression, I was hoping to abstract this out slightly.
For example:
my $name = qr/(.*)<(\d+)><(.*)><(Red|Blue)>/
my $kill = qr/"$name" killed "$name"/;
This works nicely, but the regular expression now returns results that depend on the format of $name (breaking the abstraction I'm trying to achieve). The example above would match as:
my ($name_1, $id_1, $steam_1, $team_1, $name_2, $id_2, $steam_2, $team_2)
But I'm really looking for something like:
my ($player1, $player2)
Where $player1 and $player2 would be tuples of the previous data. I figure the "killed" event doesn't need to know exactly about the player, as long as it has information to create the player, which is what these tuples provide.
Sorry if this is a bit of a ramble, but hopefully you can provide some advice!
I think I understand what you are asking. What you need to do is reverse your logic. First you need to regex to split the string into two parts, then you extract your tuples. Then your regex doesn't need to know about the name, and you just have two generic player parsing regexs. Here is an short example:
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $log = 'L 10/23/2009 - 21:03:43: "Mmm... Cycles!<67><STEAM_0:1:4779289><Red>" killed "monkey<77><STEAM_0:0:20001959><
Blue>" with "sniperrifle" (customkill "headshot") (attacker_position "1848 813 94") (victim_position "1483 358 221")';
my ($player1_string, $player2_string) = $log =~ m/(".*") killed (".*?")/;
my #player1 = $player1_string =~ m/(.*)<(\d+)><(.*)><(Red|Blue)>/;
my #player2 = $player2_string =~ m/(.*)<(\d+)><(.*)><(Red|Blue)>/;
print STDERR Dumper(\#player1, \#player2);
Hope this what you were looking for.
Another way to do it, but the same strategy as dwp's answer:
my #players =
map { [ /(.*)<(\d+)><(.*)><(Red|Blue)>/ ] }
$log_text =~ /"([^\"]+)" killed "([^\"]+)"/
;
Your log data contains several items of balanced text (quoted and parenthesized), so you might consider Text::Balanced for parts of this job, or perhaps a parsing approach rather than a direct attack with regex. The latter might be fragile if the player names can contain arbitrary input, for example.
Consider writing a Regexp::Log subclass.