Partial/near match for name and/or username in Active Directory / Powershell - powershell

Our users sometimes gives us misspelled names/usernames and I would like to be able to search active directory for a near match, sorting by closest (any algorithm would be fine).
For example, if I try
Get-Aduser -Filter {GivenName -like "Jack"}
I can find the user Jack, but not if I use "Jacck" or "ack"
Is there a simple way to do this?

You can calculate the Levenshtein distance between the two strings and make sure it's under a certain threshold (probably 1 or 2). There is a powershell example here:
Levenshtein distance in powershell
Examples:
Jack and Jacck have an LD of 1.
Jack and ack have an LD of 1.
Palle and Havnefoged have an LD of 8.

Interesting question and answers. But a possible simpler solution is to search by more than one attribute as I would hope most people would spell one of their names properly :)
Get-ADUser -Filter {GivenName -like "FirstName" -or SurName -Like "SecondName"}

The Soundex algorithm is designed for just this situation. Here is some PowerShell code that might help:
Get-Soundex.ps1

OK, based on the great answers that I got (thanks #boxdog and #Palle Due) I am posting a more complete one.
Major source: https://github.com/gravejester/Communary.PASM - PowerShell Approximate String Matching. Great Module for this topic.
1) FuzzyMatchScore function
source: https://github.com/gravejester/Communary.PASM/tree/master/Functions
# download functions to the temp folder
$urls =
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-CommonPrefix.ps1" ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-LevenshteinDistance.ps1" ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-LongestCommonSubstring.ps1" ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-FuzzyMatchScore.ps1"
$paths = $urls | %{$_.split("\/")|select -last 1| %{"$env:TEMP\$_"}}
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
for($i=0;$i -lt $urls.count;$i++){
Invoke-WebRequest -Uri $urls[$i] -OutFile $paths[$i]
}
# concatenating the functions so we don't have to deal with source permissions
foreach($path in $paths){
cat $path | Add-Content "$env:TEMP\Fuzzy_score_functions.ps1"
}
# to save for later, open the temp folder with: Invoke-Item $env:TEMP
# then copy "Fuzzy_score_functions.ps1" somewhere else
# source Fuzzy_score_functions.ps1
. "$env:TEMP\Fuzzy_score_functions.ps1"
Simple test:
Get-FuzzyMatchScore "a" "abc" # 98
Create a score function:
## start function
function get_score{
param($searchQuery,$searchData,$nlist,[switch]$levd)
if($nlist -eq $null){$nlist = 10}
$scores = foreach($string in $searchData){
Try{
if($levd){
$score = Get-LevenshteinDistance $searchQuery $string }
else{
$score = Get-FuzzyMatchScore -Search $searchQuery -String $string }
Write-Output (,([PSCustomObject][Ordered] #{
Score = $score
Result = $string
}))
$I = $searchData.indexof($string)/$searchData.count*100
$I = [math]::Round($I)
Write-Progress -Activity "Search in Progress" -Status "$I% Complete:" -PercentComplete $I
}Catch{Continue}
}
if($levd) { $scores | Sort-Object Score,Result |select -First $nlist }
else {$scores | Sort-Object Score,Result -Descending |select -First $nlist }
} ## end function
Examples
get_score "Karolin" #("Kathrin","Jane","John","Cameron")
# check the difference between Fuzzy and LevenshteinDistance mode
$names = "Ferris","Cameron","Sloane","Jeanie","Edward","Tom","Katie","Grace"
"Fuzzy"; get_score "Cam" $names
"Levenshtein"; get_score "Cam" $names -levd
Test the performance on a big dataset
## donload baby-names
$url = "https://github.com/hadley/data-baby-names/raw/master/baby-names.csv"
$output = "$env:TEMP\baby-names.csv"
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
Invoke-WebRequest -Uri $url -OutFile $output
$babynames = import-csv "$env:TEMP\baby-names.csv"
$babynames.count # 258000 lines
$babynames[0..3] # year, name, percent, sex
$searchdata = $babynames.name[0..499]
$query = "Waren" # missing letter
"Fuzzy"; get_score $query $searchdata
"Levenshtein"; get_score $query $searchdata -levd
$query = "Jon" # missing letter
"Fuzzy"; get_score $query $searchdata
"Levenshtein"; get_score $query $searchdata -levd
$query = "Howie" # lookalike
"Fuzzy"; get_score $query $searchdata;
"Levenshtein"; get_score $query $searchdata -levd
Test
$query = "John"
$res = for($i=1;$i -le 10;$i++){
$searchdata = $babynames.name[0..($i*100-1)]
$meas = measure-command{$res = get_score $query $searchdata}
write-host $i
Write-Output (,([PSCustomObject][Ordered] #{
N = $i*100
MS = $meas.Milliseconds
MS_per_line = [math]::Round($meas.Milliseconds/$searchdata.Count,2)
}))
}
$res
+------+-----+-------------+
| N | MS | MS_per_line |
| - | -- | ----------- |
| 100 | 696 | 6.96 |
| 200 | 544 | 2.72 |
| 300 | 336 | 1.12 |
| 400 | 6 | 0.02 |
| 500 | 718 | 1.44 |
| 600 | 452 | 0.75 |
| 700 | 224 | 0.32 |
| 800 | 912 | 1.14 |
| 900 | 718 | 0.8 |
| 1000 | 417 | 0.42 |
+------+-----+-------------+
These times are quite crazy, if anyone understand why please comment on it.
2) Generate a table of Names from Active Directory
The best way to do this depends on the organization of the AD. Here we have many OUs, but common users will be in Users and DisabledUsers. Also Domain and DC will be different (I'm changing ours here to <domain> and <DC>).
# One way to get a List of OUs
Get-ADOrganizationalUnit -Filter * -Properties CanonicalName |
Select-Object -Property CanonicalName
then you can use Where-Object -FilterScript {} to filter per OU
# example, saving on the temp folder
Get-ADUser -f * |
Where-Object -FilterScript {
($_.DistinguishedName -match "CN=\w*,OU=DisabledUsers,DC=<domain>,DC=<DC>" -or
$_.DistinguishedName -match "CN=\w*,OU=Users,DC=<domain>,DC=<DC>") -and
$_.GivenName -ne $null #remove users without givenname, like test users
} |
select #{n="Fullname";e={$_.GivenName+" "+$_.Surname}},
GivenName,Surname,SamAccountName |
Export-CSV -Path "$env:TEMP\all_Users.csv" -NoTypeInformation
# you can open the file to inspect
Invoke-Item "$env:TEMP\all_Users.csv"
# import
$allusers = Import-Csv "$env:TEMP\all_Users.csv"
$allusers.Count # number of lines
Usage:
get_score "Jane Done" $allusers.fullname 15 # return the 15 first
get_score "jdoe" $allusers.samaccountname 15

Related

Powershell script very slow when running with very big array of data

ive been dabbling with powershell for a while now and ive been trying to modify some data in an array.
Problem is that my source array is very large and this script takes hours to run. Maybe someone can help my optimize my script.
With a small source array the script runs just fine btw.
$array_metric_hour = #()
$array_metric_hour =
foreach ($resource in $resources) {
Write-Progress -Id 0 "Step $resource"
foreach ($hour in $Time_Array) {
Write-Progress -Id 1 -ParentId 0 "Step $resource - Substep" ($hour.timestamp+":00")
[pscustomobject] #{
resourceID = $resource
resourceName = $array_bill.resources.($resource).name
time = $hour.timestamp+":00"
Poweredon = ((($Array_combined | Where-Object {$_.resourceID -eq $resource -and $_.hour -eq $hour.timestamp}).poweredon | Measure-Object -Maximum).Maximum)
#Cpu_On = if (($Array_combined | Where-Object {$_.resourceID -eq $resource -and $_.hour -eq $hour.timestamp -and $_.poweredOn -eq "0,0"}).poweredon) {0} else {(($Array_combined | Where-Object {$_.resourceID -eq $resource -and $_.hour -eq $hour.timestamp -and $_.poweredOn -ne "0,0"}).provisionedCpu | Measure-Object -Maximum).Maximum}
Mem_GB_On = if (($Array_combined | Where-Object {$_.resourceID -eq $resource -and $_.hour -eq $hour.timestamp -and $_.poweredOn -eq "0,0"}).poweredon) {0} else {(($Array_combined | Where-Object {$_.resourceID -eq $resource -and $_.hour -eq $hour.timestamp -and $_.poweredOn -ne "0,0"}).provisionedMem_GB | Measure-Object -Maximum).Maximum}
hardware_Diskspace_GB = ((($Array_combined | Where-Object {$_.resourceID -eq $resource -and $_.hour -eq $hour.timestamp}).hardware_Diskspace_GB | Measure-Object -Maximum).Maximum)
#used_Diskspace_GB = ((($Array_combined | Where-Object {$_.resourceID -eq $resource -and $_.hour -eq $hour.timestamp}).used_Diskspace_GB | Measure-Object -Maximum).Maximum)
}
}
}
Some extra information that is required:
$Time_array has every full hour in a month, so 745 values in this case.
$array_combined exist of 98131 lines (5 minute interval with metrics during a month.
this array has the folowing items per interval.
resourceID
resourceName
timestamps
human_timestamp
hour
date
poweredOn
provisionedMem_GB
hardware_Diskspace_GB
used_Diskspace_GB
thanks for all the comments, next time ill try to supply all and correct information.
The suggestions of creating an extra filter was the winner for me. the scripts is 50 times faster in the current state and for now quick enough.
Added $filter1 and $filter2.
$array_metric_hour = #()
$array_metric_hour =
foreach ($resource in $resources) {
$filter1 = $Array_combined | Where-Object {$_.resourceID -eq $resource}
Write-Progress -Id 0 "Step $resource"
foreach ($hour in $Time_Array) {
$filter2 = $filter1 | Where-Object {$_.hour -eq $hour.timestamp}
Write-Progress -Id 1 -ParentId 0 "Step $resource - Substep" ($hour.timestamp+":00")
[pscustomobject] #{
resourceID = $resource
resourceName = $array_bill.resources.($resource).name
time = $hour.timestamp+":00"
Poweredon = if ($filter2 | where poweredOn -eq "0,0") {"0"} else {($filter2.poweredOn | Measure-Object -Maximum).Maximum}
Mem_GB_On = if ($filter2 | where poweredOn -eq "0,0") {0} else {(($filter2).provisionedMem_GB | Measure-Object -Average).Average}
hardware_Diskspace_GB = ((($filter2).hardware_Diskspace_GB | Measure-Object -Average).Average)
}
}
}
Since you're still not have provided much more of your code (e.g. where does "Array_combined" come from?), here are some important notes:
Don't use "Write-Progress" on every iteration! It has a very very huge impact on performace when using PS <=5.1, 6 and 7.
With the current "7.1" build I am using it works like a charm ("7.1.0-preview.7"). Have to look when they fixed it.
Avoid pipe'ing as much as you can when you want to have the best performance. Streaming data from one command to the other is compared to statements like "foreach {}" (NOT "Foreach-Object"!) really bad.
Here is an example for your template, even when there are some important steps missing:
# Progress bar definition
$progressActivity1 = 'Processing items'
$progressCounter1 = -1
$progressMax1 = #($resources).Count
$progressInterval1 = [math]::Ceiling($progressMax1 * 0.1) # each 10%
$progressId1 = 1
$progressParentId1 = 0
# *** use a list if your script adds objects several times.
# *** Note: "Arrays" are immutable and will be re-created each time you add something
$array_metric_hour = [System.Collections.Generic.List[psobject]]::new()
# *** good approach to add the result of a forEach-statement directly to variable. Performance is similar compared to adding objects to a list.
$array_metric_hour = foreach ($resource in $resources) {
# Progress bar counter & drawing (each 10%)
$progressCounter1++
If ($progressCounter1 % $progressInterval1 -eq 0) {
Write-Progress -Activity $progressActivity1 -PercentComplete($progressCounter1 / $progressMax1 * 100) -Id $progressId1 -ParentId $progressParentId1
}
# *** "Array_combined" is unknwon.... but according to the usage:
# !!! try to create a dictionary/hashtable of "Array_combined" with "resourceID" as key.
# !!! hash tables/dictionaries are much faster to access a particular item than arrays
# !!! access would be: $filter1 = $Array_combined[$resource]
$filter1 = $Array_combined | Where-Object { $_.resourceID -eq $resource }
# Progress bar definition
$progressActivity2 = 'Processing items'
$progressCounter2 = -1
$progressMax2 = #($Time_Array).Count
$progressInterval2 = [math]::Ceiling($progressMax2 * 0.1) # each 10%
$progressId2 = 2
$progressParentId2 = $progressId1
foreach ($hour in $Time_Array) {
# ??? don't know what $filter1 is about ...
# !!! replace that; use a hastable/dictionary
# !!! alternatively: use "foreach"-statement OR method ".where{}" which was introduced in PS 4.0
$filter2 = $filter1 | Where-Object { $_.hour -eq $hour.timestamp }
# Progress bar counter & drawing (each 10%)
$progressCounter2++
If ($progressCounter2 % $progressInterval2 -eq 0) {
Write-Progress -Activity $progressActivity2 -PercentComplete($progressCounter2 / $progressMax2 * 100) -Id $progressId2 -ParentId $progressParentId2
}
[pscustomobject] #{
resourceID = $resource
resourceName = $array_bill.resources.($resource).name
time = $hour.timestamp + ':00'
# ??? "Where-Object" could be replaced ... but don't know the background or data ....
# !!! replace "Measure-Object" with "[Linq.Enumerable]" methods if possible
Poweredon = if ($filter2 | Where-Object poweredOn -EQ '0,0') { '0' } else { ($filter2.poweredOn | Measure-Object -Maximum).Maximum }
# !!! same as above
Mem_GB_On = if ($filter2 | Where-Object poweredOn -EQ '0,0') { 0 } else { (($filter2).provisionedMem_GB | Measure-Object -Average).Average }
# !!! same as above
hardware_Diskspace_GB = ((($filter2).hardware_Diskspace_GB | Measure-Object -Average).Average)
}
}
}
# Progress completed
Write-Progress -Activity $progressActivity -Completed -Id $progressId1
Write-Progress -Activity $progressActivity -Completed -Id $progressId2

Powershell: How to add values to Object with only 1 header

Got a .ps where im getting alarmgroups from several files. Im trying to add them to an Object but the Problem is every new file hes adding another header into the Object. Is there a possibility, adding the header only 1 time. Append my data to the Object and when hes finished sorting the hole Object?
My Code.
$rootPath = $PSScriptRoot
if ($rootPath -eq "") {
$rootPath = Split-Path -Parent -Path $MyInvocation.MyCommand.Definition
}
$alarmPath = "$rootPath\Alarmgroups"
$mdi_alarms_template = "$rootPath\tmpl\mdi-alarms.tmpl.html"
$mdi_alarms = "$rootPath\mdi-alarms.html"
$fileNames = Get-ChildItem -Path $alarmPath -Filter *.algrp
$AlarmgroupIndexString = $LanguageIDString = $tmpString = $ID_string = $html_output= ""
$MachineCode = "Out_2.Alarm.Current"
$BitNo = $Element = $Format2Element = $Format3BitID =0
$BitID = $Format3TextID = 1
$list = New-Object System.Collections.ArrayList
Clear-Content "$rootPath\test.txt"
Clear-Content "$rootPath\list.txt"
Clear-Content "$rootPath\output.txt"
# Parse each alarm group file in the project
foreach ($file in $fileNames) {
$Content = [xml](Get-Content -Path $file.FullName)
$ns = New-Object System.Xml.XmlNamespaceManager($Content.NameTable)
$ns=#{asdf="http://asdf-automation.co.at/AS/VC/Project"}
$AlarmgroupIndex = Select-Xml -Xml $Content -XPath "//asdf[contains(#Name,'Index')]" -namespace $ns | select -ExpandProperty node
$AlarmgroupIndexString = $AlarmgroupIndex.Value
$AlarmgroupLanguageText = Select-Xml -Xml $Content -XPath "//asdf:TextLayer" -namespace $ns | select -ExpandProperty node
$AlarmgroupIndexMap = Select-Xml -Xml $Content -XPath "//asdf:Index" -namespace $ns | select -ExpandProperty node
$LUT =#{}
$AlarmgroupIndexMap | foreach{
$LUT.($_.ID) = $_.Value
}
$tmpArray =#()
$list = $AlarmgroupLanguageText | foreach{
$LanguageIDString = $_.LanguageId
$AlarmgroupTextLayer = Select-Xml -Xml $Content -XPath "//asdf:TextLayer[#LanguageId='$LanguageIDString']/asdf:Text" -namespace $ns | select -ExpandProperty node
$AlarmgroupTextLayer | foreach{
if($LUT.ContainsKey($_.ID))
{
$ID_string = $LUT[$_.ID]
}
[pscustomobject]#{
Language = $LanguageIDString
GroupID = $AlarmgroupIndexString
TextID = $ID_string #-as [int]
Text = $_.Value
}
$ID_string =""
}
$LanguageIDString=""
}
$list = $list |Sort-Object -Property Language, GroupID, {$_.TextID -as [int]}
# $list = $list |Sort-Object -Property #{Expression={$_.Language}}, #{Expression={$_.TextId}} , #{Expression={$_.TextID -as [int]}}
$list | Out-File "$rootPath\list.txt" -Append -Encoding utf8
Output:
GroupID Language TextID Text
------- -------- ------ ----
24 aa Group
24 aa 0
24 aa 1
24 aa 2
24 aa 3
24 aa 4
24 aa 5
24 aa 6
24 aa 7
24 aa 8
24 aa 9
24 aa 10
GroupID Language TextID Text
------- -------- ------ ----
24 ar Group
24 ar 0
24 ar 1
24 ar 2
24 ar 3
24 ar 4
24 ar 5
24 ar 6
24 ar 7
So i have several headers in my outputfile. Is it possible to erase them or add elements to the Object without the header. Tried several solution nothing worked.
If i understand it correctly im generating an Object and add it to an object with all values incl. header.
[pscustomobject]#{
Language = $LanguageIDString
GroupID = $AlarmgroupIndexString
TextID = $ID_string #-as [int]
Text = $_.Value
You can assign all the output from the foreach loop to a single variable and then move the file-write logic to the end of the script where you can output it all at once:
# Assign results of entire `foreach(){}` statement to `$combinedLists`
$combinedLists = foreach ($file in $fileNames) {
# XML navigation + object creation + assignment to `$list` still goes here
# We sort and then instead of assigning the output to a variable directly,
# we just let it "bubble up" from here to the `$combinedList` assignment
$list |Sort-Object -Property Language, GroupID, {$_.TextID -as [int]}
}
# Now we can write everything to file at once (overwrites existing contents)
$combinedLists |Out-File "$rootPath\list.txt" -Force -Encoding utf8

How to export two variables into same CSV as joined via PowerShell?

I have a PowerShell script employing poshwsus module like below:
$FileOutput = "C:\WSUSReport\WSUSReport.csv"
$ProcessLog = "C:\WSUSReport\QueryLog2.txt"
$WSUSServers = "C:\WSUSReport\Computers.txt"
$WSUSPort = "8530"
import-module poshwsus
ForEach ($Server in Get-Content $WSUSServers)
{
& connect-poshwsusserver $Server -port $WSUSPort | out-file $ProcessLog -append
$r1 = & Get-PoshWSUSClient | select #{name="Computer";expression={$_.FullDomainName}},#{name="LastUpdated";expression={if ([datetime]$_.LastReportedStatusTime -gt [datetime]"1/1/0001 12:00:00 AM") {$_.LastReportedStatusTime} else {$_.LastSyncTime}}}
$r2 = & Get-PoshWSUSUpdateSummaryPerClient -UpdateScope (new-poshwsusupdatescope) -ComputerScope (new-poshwsuscomputerscope) | Select Computer,NeededCount,DownloadedCount,NotApplicableCount,NotInstalledCount,InstalledCount,FailedCount
}
What I need to do is to export CSV outpout including the results with the columns (like "inner join"):
Computer, NeededCount, DownloadedCount, NotApplicableCount, NotINstalledCount, InstalledCount, FailedCount, LastUpdated
I have tried to use the line below in foreach, but it didn't work as I expected.
$r1 + $r2 | export-csv -NoTypeInformation -append $FileOutput
I appreciate if you may help or advise.
EDIT --> The output I've got:
ComputerName LastUpdate
X A
Y B
X
Y
So no error, first two rows from $r2, last two rows from $r1, it is not joining the tables as I expected.
Thanks!
I've found my guidance in this post: Inner Join in PowerShell (without SQL)
Modified my query accordingly like below, works like a charm.
$FileOutput = "C:\WSUSReport\WSUSReport.csv"
$ProcessLog = "C:\WSUSReport\QueryLog.txt"
$WSUSServers = "C:\WSUSReport\Computers.txt"
$WSUSPort = "8530"
import-module poshwsus
function Join-Records($tab1, $tab2){
$prop1 = $tab1 | select -First 1 | % {$_.PSObject.Properties.Name} #properties from t1
$prop2 = $tab2 | select -First 1 | % {$_.PSObject.Properties.Name} #properties from t2
$join = $prop1 | ? {$prop2 -Contains $_}
$unique1 = $prop1 | ?{ $join -notcontains $_}
$unique2 = $prop2 | ?{ $join -notcontains $_}
if ($join) {
$tab1 | % {
$t1 = $_
$tab2 | % {
$t2 = $_
foreach ($prop in $join) {
if (!$t1.$prop.Equals($t2.$prop)) { return; }
}
$result = #{}
$join | % { $result.Add($_,$t1.$_) }
$unique1 | % { $result.Add($_,$t1.$_) }
$unique2 | % { $result.Add($_,$t2.$_) }
[PSCustomObject]$result
}
}
}
}
ForEach ($Server in Get-Content $WSUSServers)
{
& connect-poshwsusserver $Server -port $WSUSPort | out-file $ProcessLog -append
$r1 = & Get-PoshWSUSClient | select #{name="Computer";expression={$_.FullDomainName}},#{name="LastUpdated";expression={if ([datetime]$_.LastReportedStatusTime -gt [datetime]"1/1/0001 12:00:00 AM") {$_.LastReportedStatusTime} else {$_.LastSyncTime}}}
$r2 = & Get-PoshWSUSUpdateSummaryPerClient -UpdateScope (new-poshwsusupdatescope) -ComputerScope (new-poshwsuscomputerscope) | Select Computer,NeededCount,DownloadedCount,NotApplicableCount,NotInstalledCount,InstalledCount,FailedCount
Join-Records $r1 $r2 | Select Computer,NeededCount,DownloadedCount,NotApplicableCount,NotInstalledCount,InstalledCount,FailedCount, LastUpdated | export-csv -NoTypeInformation -append $FileOutput
}
I think this could be made simpler. Since Select-Object's -Property parameter accepts an array of values, you can create an array of the properties you want to display. The array can be constructed by comparing your two objects' properties and outputting a unique list of those properties.
$selectProperties = $r1.psobject.properties.name | Compare-Object $r2.psobject.properties.name -IncludeEqual -PassThru
$r1,$r2 | Select-Object -Property $selectProperties
Compare-Object by default will output only differences between a reference object and a difference object. Adding the -IncludeEqual switch displays different and equal comparisons. Adding the -PassThru parameter outputs the actual objects that are compared rather than the default PSCustomObject output.

CSV file - count distinct, group by, sum

I have a file that looks like the following;
- Visitor ID,Revenue,Channel,Flight
- 1234,100,Email,BA123
- 2345,200,PPC,BA112
- 456,150,Email,BA456
I need to produce a file that contains;
The count of distinct Visitor IDs (3)
The total revenue (450)
The count of each Channel
Email 2
PPC 2
The count of each Flight
BA123 1
BA112 1
BA456 1
So far I have the following code, however when executing this on the 350MB file, it takes too long and in some cases breaks the memory limit. As I have to run this function on multiple columns, it is going through the file many times. I ideally need to do this in one file pass.
$file = 'log.txt'
function GroupBy($columnName)
{
$objects = Import-Csv -Delimiter "`t" $file | Group-Object $columnName |
Select-Object #{n=$columnName;e={$_.Group[0].$columnName}}, Count
for($i=0;$i -lt $objects.count;$I++) {
$line += $columnName +"|"+$objects[$I]."$columnName" +"|Count|"+ $objects[$I].'Count' + $OFS
}
return $line
}
$finalOutput += GroupBy "Channel"
$finalOutput += GroupBy "Flight"
Write-Host $finalOutput
Any help would be much appreciated.
Thanks,
Craig
The fact that your are importing the CSV again for each column is what is killing your script. Try to do the loading once, then re-use the data. For example:
$data = Import-Csv .\data.csv
$flights = $data | Group-Object Flight -NoElement | ForEach-Object {[PsCustomObject]#{Flight=$_.Name;Count=$_.Count}}
$visitors = ($data | Group-Object "Visitor ID" | Measure-Object).Count
$revenue = ($data | Measure-Object Revenue -Sum).Sum
$channel = $data | Group-Object Channel -NoElement | ForEach-Object {[PsCustomObject]#{Channel=$_.Name;Count=$_.Count}}
You can display the data like this:
"Revenue : $revenue"
"Visitors: $visitors"
$flights | Format-Table -AutoSize
$channel | Format-Table -AutoSize
This will probably work - using hashmaps.
Pros: It will be faster/use less memory.
Cons: It is less readable
by far than Group-Object, and requires more code.
Make it even less memory-hungry: Read the CSV-file line by line
$data = Import-CSV -Path "C:\temp\data.csv" -Delimiter ","
$DistinctVisitors = #{}
$TotalRevenue = 0
$ChannelCount = #{}
$FlightCount = #{}
$data | ForEach-Object {
$DistinctVisitors[$_.'Visitor ID'] = $true
$TotalRevenue += $_.Revenue
if (-not $ChannelCount.ContainsKey($_.Channel)) {
$ChannelCount[$_.Channel] = 0
}
$ChannelCount[$_.Channel] += 1
if (-not $FlightCount.ContainsKey($_.Flight)) {
$FlightCount[$_.Flight] = 0
}
$FlightCount[$_.Flight] += 1
}
$DistinctVisitorsCount = $DistinctVisitors.Keys | Measure-Object | Select-Object -ExpandProperty Count
Write-Output "The count of distinc Visitor IDs $DistinctVisitorsCount"
Write-Output "The total revenue $TotalRevenue"
Write-Output "The Count of each Channel"
$ChannelCount.Keys | ForEach-Object {
Write-Output "$_ $($ChannelCount[$_])"
}
Write-Output "The count of each Flight"
$FlightCount.Keys | ForEach-Object {
Write-Output "$_ $($FlightCount[$_])"
}

How to use Group-Object on this?

I am trying to get all the accounts from $f which do not match the accounts in $table4 into $accounts. But I need to also check if the occupancy number matches or not.
CSV $f:
Account_no |occupant_code
-----------|------------
12345 | 1
67890 | 2
45678 | 3
DataTable $table4
Account_no |occupant_code
-----------|------------
12345 | 1
67890 | 1
45678 | 3
Current code:
$accounts = Import-Csv $f |
select account_no, occupant_code |
where { $table4.account_no -notcontains $_.account_no }
What this needs to do is to check that occupant_code doesn't match, i.e.:
12345: account and occupant from $f and $table4 match; so it's ignored
67890: account matches $table4, but occupancy_code does not match, so it is added to $accounts.
Current result:
Desired result: 67890
I believe I need to use Group-Object, but I do not know how to use that correctly.
I tried:
Import-Csv $f |
select account_no, occupant_code |
Group-Object account_no |
Where-Object { $_.Group.occupant_code -notcontains $table4.occupant_code }
An alternative to Bill's suggestion would be to fill a hashtable with your reference data ($table4) and look up the occupant_code value for each account from $f, assuming that your account numbers are unique:
$ref = #{}
$table4 | ForEach-Object {
$ref[$_.Account_no] = $_.occupant_code
}
$accounts = Import-Csv $f |
Where-Object { $_.occupant_code -ne $ref[$_.Account_no] } |
Select-Object -Expand Account_no
Compare-Object?
csv1.csv:
Account_no,occupant_code
12345,1
67890,2
45678,3
csv2.csv:
Account_no,occupant_code
12345,1
67890,1
45678,3
PowerShell command:
Compare-Object (Import-Csv .\csv1.csv) (Import-Csv .\csv2.csv) -Property occupant_code -PassThru
Output:
Account_no occupant_code SideIndicator
---------- ------------- -------------
67890 1 =>
67890 2 <=
$f | InnerJoin $table4 {$Left.Account_no -eq $Right.Account_no -and $Left.occupant_code -ne $Right.occupant_code} #{Account_no = {$Left.$_}} | Format-Table
Result:
occupant_code Account_no
------------- ----------
{2, 1} 67890
For details see: In Powershell, what's the best way to join two tables into one?
In addition to all the other answers, you might be able to leverage the IndexOf() method on arrays
$services = get-service
$services.name.IndexOf("xbgm")
240
I am on a tablet right now and don't have a handy way to test it, but something along these lines might work for you:
$table4.account_no.IndexOf($_.account_no)
should fetch the index your account_no lives in for $table 4, so you could jam it all into one ugly pipe:
$accounts = Import-Csv $f | select account_no, occupant_code |
where { ($table4.account_no -notcontains $_.account_no) -or ($table4[$table4.account_no.IndexOf($_.account_no)].occupant_code -ne $_.occupant_code) }
An inner join or a normal loop might just be cleaner though, especially if you want to add some other stuff in. Since someone posted an innerjoin, you could try a loop like:
$accounts = new-object System.Collections.ArrayList
$testSet = $table4.account_no
foreach($myThing in Import-Csv $f)
{
if($myThing.account_no -in $testSet )
{
$i = $testSet.IndexOf($myThing.account_no)
if($table4[$i].occupant_code -eq $myThing.occupant_code) {continue}
}
$accounts.add($myThing)
}
Edit for OP, he mentioned $table4 is a data.table
There is probably a much better way to do this, as I haven't used data.table before, but this seems to work fine:
$table = New-Object system.Data.DataTable
$col1 = New-Object system.Data.DataColumn Account_no,([string])
$col2 = New-Object system.Data.DataColumn occupant_code,([int])
$table.columns.add($col1)
$table.columns.add($col2)
$row = $table.NewRow()
$row.Account_no = "12345"
$row.occupant_code = 1
$table.Rows.Add($row)
$row = $table.NewRow()
$row.Account_no = "67890"
$row.occupant_code = 1
$table.Rows.Add($row)
$row = $table.NewRow()
$row.Account_no = "45678"
$row.occupant_code = 3
$table.Rows.Add($row)
$testList = #()
$testlist += [pscustomobject]#{Account_no = "12345"; occupant_code = 1}
$testlist += [pscustomobject]#{Account_no = "67890"; occupant_code = 2}
$testlist += [pscustomobject]#{Account_no = "45678"; occupant_code = 3}
$accounts = new-object System.Collections.ArrayList
$testSet = $table.account_no
foreach($myThing in $testList)
{
if($myThing.account_no -in $testSet )
{
$i = $testSet.IndexOf($myThing.account_no)
if($table.Rows[$i].occupant_code -eq $myThing.occupant_code) {continue}
}
$accounts.add($myThing) | out-null
}
$accounts