Get unique values - powershell

We're trying to optimize some code that removes duplicates from an Array as fast as possible. Normally this can be easily done by piping the input to Group-Object and then using only the Name property. But we would like to avoid the pipeline, as it is slower.
However, we tried the following code:
[System.Collections.ArrayList]$uniqueFrom = #()
$From = #('A', 'A', 'B')
$From.Where({-not ($uniqueFrom.Contains($_))}).ForEach({
$uniqueFrom.Add($_)
})
$uniqueFrom
In theory, this should work. But for one reason or another the output is not the expected #('A', 'B'). Why is it not reevaluating the ArrayList in the .where clause?

In my experience reducing the 'pipe filtering' to get the unique values can be achieve by using DataView. If you are processing an array you need to convert this to a DataTable first before you get the values using the DataView.
e.g.
$arr = #('val1','val1','val1','val2','val1','val3'....)
$newDatatable = New-Object System.Data.Datatable
[void]$newDatatable.Columns.Add("FetchUniqueColumn")
foreach($e in $arr)
{
$row = $newDatatable.NewRow()
$row.Item('FetchUniqueColumn') = $e
$newDatatable.Rows.Add($row)
}
$filterDataView = New-Object System.Data.Dataview($newDatatable)
$UniqueDT = $filterDataView.ToTable($true,'FetchUniqueColumn')
$UniqueValues_array = $UniqueDT.Rows.FetchUniqueColumn
Note this is a whole lot faster if your input is a DataTable since you don't have to convert it anymore prior to setting the DataView filter for unique values to $true in creating the $UniqueDT datatable from the dataview:
$UniqueDT = $filterDataView.ToTable($true,'FetchUniqueColumn')
Tested by querying 1 column with 3000 rows datatable from SQL.
My results are as follows:
**With 1 column Data Table as input
Select -Unique - 300 ms
Using DataView - 21 ms
**With #() array as input (converted SQL results to array prior to benchmarking)
Select Unique - 262 ms
Using DataView - 106 ms

Disclaimer: in this answer I'm just explaining why the current code isn't working, not attempting to give alternative solution. For solution check the accepted answer.
Why is it not reevaluating the ArrayList in the .where clause?
It's not supposed to do this. What it is actually doing is filtering here:
$From.Where({-not ($uniqueFrom.Contains($_))})
and then executing
$uniqueFrom.Add($_)
for each element. As you did
[System.Collections.ArrayList]$uniqueFrom = #()
this array is empty and therefore will return $false for any $uniqueFrom.Contains($_)
Proof:
To verify that what I've written above is true you can do the following:
[System.Collections.ArrayList]$uniqueFrom = #()
$uniqueFrom.add("A")
$From.Where({-not ($uniqueFrom.Contains($_))}).ForEach({
$uniqueFrom.Add($_)
})
Output is A, B (A was added manually, two A were skipped as this entry already exists in $uniqueFrom, B was added inside ForEach) as expected.

Related

Powershell filtering one list out of another list

<updated, added Santiago Squarzon suggest information>
I have two lists, I pull them from csv but there is only one column in each of the two lists.
Here is how I pull in the lists in my script
$orginal_list = Get-Content -Path .\random-word-350k-wo-quotes.txt
$filter_words = Get-Content -Path .\no_go_words.txt
However, I will use a typed list for simplicity in the code example below.
In this example, the $original_list can have some words repeated.
I want to filter out all of the words in $original_list that are in the $filter_words list.
Then add the filtered list to the variable $filtered_list.
In this example, $filtered_list would only have "dirt","turtle" in it.
I know the line I have below where I subtract the two won't work, it's there as a placeholder as I don't know what to use to get the result.
Of note, the csv file that feeds $original_list could have 300,000 or more rows, and $filter_words could have hundreds of rows. So would want this to be as efficient as possible.
The filtering is case insensitive.
$orginal_list = "yellow","blue","yellow","dirt","blue","yellow","turtle","dirt"
$filter_words = "yellow","blue","green","harsh"
$filtered_list = $orginal_list - $filter_words
$filtered_list
dirt
turtle
Use System.Collections.Generic.HashSet`1 and its .ExceptWith() method:
# Note: if possible, declare the lists as [string[]] arrays to begin with.
# Otherwise, use a [string[]] cast im the method calls below, which,
# however, creates a duplicate array on the fly.
[string[]] $orginal_list = "yellow","blue","yellow","dirt","blue","yellow","turtle","dirt"
[string[]] $filter_words = "yellow","blue","green","harsh"
# Create a hash set based on the strings in $orginal_list,
# with case-insensitive lookups.
$hsOrig = [System.Collections.Generic.HashSet[string]]::new(
$orginal_list,
[System.StringComparer]::CurrentCultureIgnoreCase
)
# Reduce it to those strings not present in $filter_words, in-place.
$hsOrig.ExceptWith($filter_words)
# Convert the filtered hash set to an array.
[string[]] $filtered_list = [string[]]::new($hsOrig.Count)
$hsOrig.CopyTo($filtered_list)
# Output the result
$filtered_list
The above yields:
dirt
turtle
To also speed up reading your input files, use the following:
# Note: System.IO.File]::ReadAllLines() returns a [string[]] instance.
$orginal_list = [System.IO.File]::ReadAllLines((Convert-Path .\random-word-350k-wo-quotes.txt))
$filter_words = [System.IO.File]::ReadAllLines((Convert-Path .\no_go_words.txt))
Note:
.NET generally defaults to (BOM-less) UTF-8; pass a [System.Text.Encoding] instance as a second argument, if needed.
.NET's working dir. usually differs from PowerShell's, so the use of full paths is always advisable in .NET API calls, and that is what the Convert-Path calls ensure.
I have found that using Linq to filter one list out from another is incredibly easy and incredibly fast (especially for large lists)
# ARRAY OF 1000 STRINGS LOWERCASE (item1 - item1000)
[string[]]$ThousandItems = 1..1000 | %{"item$_"};
# ARRAY OF 100 STRINGS UPPERCASE (ITEM901 - ITEM1000)
[string[]]$HundredItems = 901..1000 | %{"ITEM$_"};
# SUBTRACT THE SECOND ARRAY FROM THE FIRST ONE (CASE INSENSITIVELY)
[string[]]$NineHundred = [Linq.Enumerable]::Except($ThousandItems, $HundredItems, [System.StringComparer]::OrdinalIgnoreCase);
$NineHundred;
Which returns the list of 1000 items minus Item901-Item1000
item1
item2
...
item899
item900
As for speed, removing 100 items from a list...
1,000 Items = 1ms
10,000 Items = 2ms
100,000 Items = 12ms
1,000,000 Items = 259ms
10,000,000 Items = 3,008ms
Note: These times are just on the [Linq.Enumerable]::Except() line. So it's just measuring the time taken to subtract one array from the other. It does not measure the time taken to fill the array.
So to apply this to the original poster's example
$original_list = [System.IO.File]::ReadAllLines((Convert-Path .\random-word-350k-wo-quotes.txt));
$filter_words = [System.IO.File]::ReadAllLines((Convert-Path .\no_go_words.txt));
[string[]]$filtered_list = [Linq.Enumerable]::Except($original_list,$filter_words,[System.StringComparer]::OrdinalIgnoreCase);
For this, I literally inserted 350K strings (the MD5 hash of the numbers 1 - 350K) into the original list (uppercase), inserted 10K strings (the MD5 hash of the numbers 1-10K) into the filter words list (lowercase) and ran that code.
There were 340K words in the filtered list, and it only took 260ms to read both files, filter and return the list

Getting error when adding nested hashtable to array in Powershell

I have a nested hashtable with an array and I want to loop through the contents of another array and add that to the nested hashtable. I'm trying to build a Slack message block.
Here's the nested hashtable I want to add to:
$msgdata = #{
blocks = #(
#{
type = 'section'
text = #{
type = 'mrkdwn'
text = '*Services Being Used This Month*'
}
}
#{
type = 'divider'
}
)
}
$rows = [ ['azure vm', 'centralus'], ['azure sql', 'eastus'], ['azure functions', 'centralus'], ['azure monitor', 'eastus2'] ]
$serviceitems = #()
foreach ($r in $rows) {
$servicetext = "*{0}* - {1}" -f $r[1], $r[0]
$serviceitems += #{'type'='section'}
$serviceitems += #{'text'= ''}
$serviceitems.text.Add('type'='mrkdwn')
$serviceitems.text.Add('text'=$servicetext)
$serviceitems += #{'type'='divider'}
}
$msgdata.blocks += $serviceitems
The code is partially working. The hashtables #{'type'='section'} and #{'type'='divider'} get added successfully. Trying to add the nested hashtable of #{'text' = #{ 'type'='mrkdwn' 'text'=$servicetext }} fails with this error:
Line |
24 | $serviceitems.text.Add('type'='mrkdwn')
| ~
| Missing ')' in method call.
I tried looking through various Powershell posts and couldn't find one that applies to my specific situation. I'm brand new to using hashtables in Powershell.
Complementing mklement0's helpful answer, which solves the problem with your existing code, I suggest the following refactoring, using inline hashtables:
$serviceitems = foreach ($r in $rows) {
#{
type = 'section'
text = #{
type = 'mrkdwn'
text = "*{0}* - {1}" -f $r[1], $r[0]
}
}
#{
type = 'divider'
}
}
$msgdata.blocks += $serviceitems
This looks much cleaner and thus easier to maintain in my opinion.
Explanations:
$serviceitems = foreach ... captures all output (to the success stream) of the foreach loop in variable $serviceitems. PowerShell automatically creates an array from the output, which is more efficient than manually adding to an array using the += operator. Using += PowerShell has to recreate an array of the new size for each addition, because arrays are actually of fixed size. When PowerShell automatically creates an array, it uses a more efficient data structure internally.
By writing out an inline hash table, without assigning it to a variable, PowerShell implicitly outputs the data, in effect adding it to the $serviceitems array.
We output two hash tables per loop iteration, so PowerShells adds two array elements to $serviceitems per loop iteration.
Note:
This answer addresses your question as asked, specifically its syntax problems.
For a superior solution that bypasses the original problems in favor of streamlined code, see zett42's helpful answer.
$serviceitems.text.Add('type'='mrkdwn') causes a syntax error.
Generally speaking, IF $serviceitems.text referred to a hashtable (dictionary), you need either:
method syntax with distinct, ,-separated arguments:
$serviceitems.text.Add('type', 'mrkdwn')
or index syntax (which would quietly overwrite an existing entry, if present):
$serviceitems.text['type'] = 'mrkdwn'
PowerShell even lets you access hashtable (dictionary) entries with member-access syntax (dot notation):
$serviceitems.text.type = 'mrkdwn'
In your specific case, additional considerations come into play:
You're accessing a hashtable via an array, instead of directly.
The text entry you're trying to target isn't originally a nested hashtable, so you cannot call .Add() on it; instead, you must assign a new hashtable to it.
Therefore:
# Define an empty array
$serviceItems = #()
# "Extend" the array by adding a hashtable.
# Note: Except with small arrays, growing them with +=
# should be avoided, because a *new* array must be allocated
# every time.
$serviceItems += #{ text = '' }
# Refer to the hashtable via the array's last element (-1),
# and assign a nested hashtable to it.
$serviceItems[-1].text = #{ 'type' = 'mrkdwn' }
# Output the result.
$serviceItems

finding index of key in an ordered dictionary in powershell

I am having a little bit of trouble with hashtables/dictionaries in powershell. The most recent roadblock is the ability to find the index of a key in an ordered dictionary.
I am looking for a solution that isn't simply iterating through the object.
(I already know how to do that)
Consider the following example:
$dictionary = [Ordered]#{
'a' = 'blue';
'b'='green';
'c'='red'
}
If this were a normal array I'd be able to look up the index of an entry by using IndexOf().
[array]::IndexOf($dictionary,'c').
That would return 2 under normal circumstances.
If I try that with an ordered dictionary, though, I get -1.
Any solutions?
Edit:
In case anyone reading over this is wondering what I'm talking about. What I was trying to use this for was to create an object to normalize property entries in a way that also has a numerical order.
I was trying to use this for the status of a process, for example:
$_processState = [Ordered]#{
'error' = 'error'
'none' = 'none'
'started' = 'started'
'paused' = 'paused'
'cleanup' = 'cleanup'
'complete' = 'complete'
}
If you were able to easily do this, the above object would give $_processState.error an index value of 0 and ascend through each entry, finally giving $_processState.complete an index value of 5. Then if you compared two properties, by "index value", you could see which one is further along by simple operators. For instance:
$thisObject.Status = $_processState.complete
If ($thisObject.Status -ge $_processState.cleanup) {Write-Host 'All done!'}
PS > All done!
^^that doesn't work as is, but that's the idea. It's what I was aiming for. Or maybe to find something like $_processState.complete.IndexNumber()
Having an object like this also lets you assign values by the index name, itself, while standardizing the options...
$thisObject.Status = $_processState.paused
$thisObject.Status
PS > paused
Not really sure this was the best approach at the time or if it still is the best approach with all the custom class options there are available in PS v5.
It can be simpler
It may not be any more efficient than the answer from Frode F., but perhaps more concise (inline) would be simply putting the hash table's keys collection in a sub expression ($()) then calling indexOf on the result.
For your hash table...
Your particular expression would be simply:
$($dictionary.keys).indexOf('c')
...which gives the value 2 as you expected. This also works just as well on a regular hashtable... unless the hashtable is modified in pretty much any way, of course... so it's probably not very useful in that case.
In other words
Using this hash table (which also shows many of the ways to encode 4...):
$hashtable = [ordered]#{
sample = 'hash table'
0 = 'hello'
1 = 'goodbye'
[char]'4' = 'the ansi character 4 (code 52)'
[char]4 = 'the ansi character code 4'
[int]4 = 'the integer 4'
'4' = 'a string containing only the character 4'
5 = "nothing of importance"
}
would yield the following expression/results pairs:
# Expression Result
#------------------------------------- -------------
$($hashtable.keys).indexof('5') -1
$($hashtable.keys).indexof(5) 7
$($hashtable.keys).indexof('4') 6
$($hashtable.keys).indexof([char]4) 4
$($hashtable.keys).indexof([int]4) 5
$($hashtable.keys).indexof([char]'4') 3
$($hashtable.keys).indexof([int][char]'4') -1
$($hashtable.keys).indexof('sample') 0
by the way:
[int][char]'4' equals [int]52
[char]'4' has a "value" (magnitude?) of 52, but is a character, so it's used as such
...gotta love the typing system, which, while flexible, can get really really bad at times, if you're not careful.
Dictionaries uses keys and not indexes. OrderedDictionary combines a hashtable and ArrayList to give you order/index-support in a dictionary, however it's still a dictionary (key-based) collection.
If you need to get the index of an object in a OrderedDictionary (or a hasthable) you need to use foreach-loop and a counter. Example (should be created as a function):
$hashTable = [Ordered]#{
'a' = 'blue';
'b'='green';
'c'='red'
}
$i = 0
foreach($key in $hashTable.Keys) {
if($key -eq "c") { $i; break }
else { $i++ }
}
That's how it works internaly too. You can verify this by reading the source code for OrderedDictionary's IndexOfKey method in .NET Reference Source
For the initial problem I was attempting to solve, a comparable process state, you can now use Enumerations starting with PowerShell v5.
You use the Enum keyword, set the Enumerators by name, and give them an integer value. The value can be anything, but I'm using ascending values starting with 0 in this example:
Enum _ProcessState{
Error = 0
None = 1
Started = 2
Paused = 3
Cleanup = 4
Complete = 5
Verified = 6
}
#the leading _ for the Enum is just cosmetic & not required
Once you've created the Enum, you can assign it to variables. The contents of the variable will return the text name of the Enum, and you can compare them as if they were integers.
$Item1_State = [_ProcessState]::Started
$Item2_State = [_ProcessState]::Cleanup
#return state of second variable
$Item2_state
#comparison
$Item1_State -gt $Item2_State
Will return:
Cleanup
False
If you wanted to compare and return the highest:
#sort the two objects, then return the first result (should return the item with the largest enum int)
$results = ($Item1_State,$Item2_State | Sort-Object -Descending)
$results[0]
Fun fact, you can also use arithmetic on them, for example:
$Item1_State + 1
$Item1_State + $Item2_State
Will return:
Paused
Verified
More info on Enum here:
https://blogs.technet.microsoft.com/heyscriptingguy/2015/08/26/new-powershell-5-feature-enumerations/
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_enum?view=powershell-6
https://psdevopsug.scot/post/working-with-enums-in-powershell/

Issues importing csv column and replacing it from hash value

Please note that this data has been cleaned to prevent identifying information and considerable white space has been removed from between the commas in order to aid in readability. Lastly at the end of the TYPE column there is an additional line saying how many lines were exported which hopefully will be ignored by the script.
TYPE ,DATE ,TIME ,STREET ,CROSS-STREET ,X-COORD ,Y-COORD
459 ,2015-05-03 00:00:00.000,00:58:35,FOO DR ,A RD/B CT , 0.0, 0.0
488 ,2015-05-03 00:00:00.000,02:31:54,BAR AV ,C ST/D ST , 0.0, 0.0
I am attempting to import this CSV using Import-CSV, convert the TYPE numeric codes into different strings. An example would be 459 becomes Apple. 488 becomes Banana and so forth. I have created a hash with the TYPE numbers as the key and the value being what I want it changed to.
So my issue is really two-fold; I have been so far unable to get the TYPE CSV column to import into the script (I've been trying an array for the most part) and I am not sure the best way to build the logic to check the array data against my hash keys and replace it with the appropriate value.
# declare filename to modify
$strFileName="test.csv"
# import the type data into its own array
$imported_CSV = Import-Csv $strFileName
# populate hash
$conversion_Hash = #{
187 = Homicide;
211 = Robbery;
245 = Assault;
451 = Arson;
459 = Burglary;
484 = Larceny;
487 = Grand Theft;
488 = Petty Theft;
10851 = Stolen Vehicle;
HS = Drug;
}
# perform the conversion
foreach ($record in $imported_CSV)
{
$conversion_Hash[$record.Type]
}
This has no logic and just contains the code that was presented in the answer below. Note that I addressed that it doesn't work in the comments below.
I think this is an example of what you are looking for:
$hashTable = #{459= Apple; 488= Banana;}
$csv = import-csv <file>
foreach($record in $csv)
{
$hashTable[$record.Type] #returns hash value
}
Output:
Apple
Banana
So we have several little issues here. The two big ones are your source file and the your hashtable keys are integers and not strings.
# declare filename to modify
$strFileName="c:\temp\point.csv"
# import the type data into its own array
$imported_CSV = (Get-Content $strFileName) -replace "\s*,\s*","," | ConvertFrom-Csv
# populate hash
$conversion_Hash = #{
"187" = "Homicide";
"211" = "Robbery";
"245" = "Assault";
"451" = "Arson";
"459" = "Burglary";
"484" = "Larceny";
"487" = "Grand Theft";
"488" = "Petty Theft";
"10851" = "Stolen Vehicle";
"HS" = "Drug";
}
# perform the conversion
foreach ($record in $imported_CSV)
{
$conversion_Hash[$record.Type]
}
Output from naughty people
Burglary
Petty Theft
I don't know if your source file looks like it does in your question but there is a bunch of whitespace there that will be giving you a hassle. Namely you dont have a TYPE column but a "TYPE " (without the spaces). Same goes for the other columns. Data is affected as well. It's not 459 but "459 "(without the spaces).
To fix that I check the file and replace all space surrounding the commas with just the comma.
TYPE,DATE,TIME,STREET,CROSS-STREET,X-COORD,Y-COORD
459,2015-05-03 00:00:00.000,00:58:35,FOO DR,A RD/B CT,0.0,0.0
488,2015-05-03 00:00:00.000,02:31:54,BAR AV,C ST/D ST,0.0,0.0
If your data already looks like that then you need to be careful posting this stuff in your question. Onto the other issue with your comparison
You will see I have quoted almost everything in that hashtable. I had to for the values as they were being taken as commands otherwise. I also quoted the keys as the csv table contains string and not integers. I would have just casted to [int] to avoid the whole issue but one of your keys is called "HS" which does not look like a number to me :).
What I might have done
Just to play a little I might have added another note property to the list called TypeAsString which would add a column.
# perform the conversion
$imported_CSV | ForEach-Object{
$_ | Add-Member -MemberType NoteProperty -Name "TypeAsString" -Value $conversion_Hash[$_.Type] -PassThru
}
So the output from one item would look like this
TYPE : 459
DATE : 2015-05-03 00:00:00.000
TIME : 00:58:35
STREET : FOO DR
CROSS-STREET : A RD/B CT
X-COORD : 0.0
Y-COORD : 0.0
TypeAsString : Burglary
I could have made a more dynamic property like a script property, so that changes in $conversion_Hash are updated instantly, but this should suffice for what you need.

Returning a Row Count for a DataRow in PowerShell

My script is populating a datarow from a stored procedure in SQL Server. I then reference specific columns in this datarow throughout the script. What I'm trying to do is add functionality that takes action X if the row count = 0, action Y if the row count = 1, and action Z if the row count > 1.
-- PowerShell script snippet
# $MyResult is populated earlier;
# GetType() returns Name=DataRow, BaseType=System.Object
# this works
ForEach ($MyRow In $MyResult) {
$MyFile = Get-Content $MyRow.FileName
# do other cool stuff
}
# this is what I'm trying to do, but doesn't work
If ($MyResult.Count -eq 0) {
# do something
}
ElseIf ($MyResult.Count -eq 1) {
# do something else
}
Else {
# do this instead
}
I can get $MyResult.Count to work if I'm using an array, but then I can't reference $MyRow.FileName directly.
This is probably pretty simple, but I'm new to PowerShell and object-oriented languages. I've tried searching this site, The Scripting Guy's blog, and Google, but I haven't been able to find anything that shows me how to do this.
Any help is much appreciated.
It has everything to do with how you populate $MyResult. If you query the database like
$MyResult = #( << code that returns results from database >> )
that is, enclosing the code that returns your dataset/datatable from the database within #( ... ), then number of rows returned will be easily checked using $MyResult.count.
Your original code should work as-is if you populate $MyResult this way.
I know this thread is old, but if someone else finds it on Google, this should work also on PS V5:
Replace $MyResult.Count with: ($MyResult | Measure-Object | select -ExpandProperty Count)
For Example:
If (($MyResult | Measure-Object | select -ExpandProperty Count) -eq 0)
I don't have experience with PS and SQL, but I'll try to provide an answer for you. If you're object $myresult is a datarow-object, it means you only got the one row. If the results are empty, then $myresult will usually be null.
If you get one or more rows, you can put them in an array and count it. However, if your $myresult are null, and you put it in an array it will still count as one, so we need to watch out for that. Try this:
If ($MyResult -eq $null) {
# do something if no rows
}
Else If (#($MyResult).Count -eq 1) {
# do something else if there are 1 rows.
# The cast to array was only in the if-test,
# so you can reach the object with $myresult.
}
Else {
# do this if there are multiple rows.
}
Looks like this question gets a lot of views, so I wanted to post how I handled this. :)
Basically, the fix for me was to change the method I was using to execute a query on SQL Server. I switched to Chad Miller's Invoke-SqlCmd2 script: TechNet: Invoke-SqlCmd2, i.e.
# ---------------
# this code works
# ---------------
# Register the function
. .\Invoke-Sqlcmd2.ps1
# make SQL Server call & store results to an array, $MyResults
[array]$MyResults = Invoke-Sqlcmd2 -Serve
rInstance "(local)" -Query "SELECT TOP 1 * FROM sys.databases;"
If ($MyResult -eq $null) {
# do something
}
ElseIf ($MyResult.Count -eq 1) {
# do something else
}
Else {
# do this instead
}