Optimize Get-ADUser filter - powershell

In AD, I'm trying to identify user accounts where the same EmployeeID value is populated in 2 or more records. Below is my piece of code (Credit: I'm using a Show-Progress function defined here) and the Get-ADUser command alone has taken more than 2 hours to fetch all the records. The other steps (2 to 5) have been pretty quick. While I've completed the work, I'm trying to know if this could've been done more efficiently with PowerShell.
Get-ADUser -LDAPFilter "(&(ObjectCategory=Person)(objectclass=user)(employeeid=*))" -Properties $properties -Server $server_AD_GC -ResultPageSize 1000 |
# *ISSUE HERE*
# The Get-ADUser extract process seems to work very slow.
# However, it is important to note that the above command will be retrieving more than 200K records
# NOTE: I've inferred that employeeid is an indexed attribute and is replicated to GlobalCatalogs and hence have used it in the filter
Show-Progress -Activity "(1/5) Getting AD Users ..." |
select $selectPropsList -OutVariable results_UsersBaseSet |
Group-Object EmployeeID |
Show-Progress -Activity "(2/5) Grouping on EmployeeID ..." |
? { $_.Count -gt 1 } |
Show-Progress -Activity "(3/5) Filtering only dup EmpID records ..." |
select -Exp Group |
Show-Progress -Activity "(4/5) UnGrouping ..." |
Export-Csv "C:\Users\me\op_GetADUser_w_EmpID_Dupes_EntireForest - $([datetime]::Now.ToString("MM-dd-yyyy_hhmmss")).csv" -NoTypeInformation |
Show-Progress -Activity "(5/5) Exporting ..." |
Out-Null
PS: I've also tried to first export all the user accounts to a csv file and then post-process with Excel but I had to frown because of the size of the dataset and it was both time and memory crunching.
Any suggestion is highly appreciated.

Since we don't know what is in $properties or $selectPropsList, your question is really only about finding out to which users the same EmployeeID has been issued, right?
By default, Get-ADUser already returns these properties:
DistinguishedName, Enabled, GivenName, Name, ObjectClass, ObjectGUID, SamAccountName, SID, Surname, UserPrincipalName
So all you need extra is the EmployeeID I guess. Trying to collect LOTS of properties does slow down, so keeping this to a bare minimum helps to speed things up.
Next, by using the Show-Progress script you have linked to, you will slow down the execution of the script considerably. Do you really need to have a progress bar?
Why not simply write the lines with activity steps directly to the console?
Also, piping everything together doesn't help in the speed department either..
$server_AD_GC = 'YourServer'
$selectPropsList = 'EmployeeID', 'Name', 'SamAccountName', 'Enabled'
$outFile = "C:\Users\me\op_GetADUser_w_EmpID_Dupes_EntireForest - $([datetime]::Now.ToString("MM-dd-yyyy_hhmmss")).csv"
Write-Host "Step (1/4) Getting AD Users ..."
$users = Get-ADUser -Filter "EmployeeID -like '*'" -Properties EmployeeID -Server $server_AD_GC -ResultPageSize 1000
Write-Host "Step (2/4) Grouping on EmployeeID ..."
$dupes = $users | Group-Object -Property EmployeeID | Where-Object { $_.Count -gt 1 }
Write-Host "Step (3/4) Collecting duplicates ..."
$result = foreach ($group in $dupes) {
$group.Group | Select-Object $selectPropsList
}
Write-Host "Step (4/4) Exporting ..."
$result | Export-Csv -Path $outFile -NoTypeInformation
Write-Host "All done" -ForegroundColor Green
P.S. Get-ADUser already returns user objects only, so there is no need for the LDAP filter (ObjectCategory=Person)(objectclass=user). Using -Filter "EmployeeID -like '*'" is probably faster

This answer complements Theo's helpful answer and focuses on showing progress during the operation:
The linked Show-Progress function, which is the latest as of this writing:
has an outright bug, in that it doesn't pass pipeline input through (the relevant line is accidentally commented out)
is conceptually flawed in that it doesn't use a process block, which means that all pipeline input is collected first, before it is processed - which defeats the idea of a progress bar.
Therefore, you Show-Progress calls won't show progress until the previous command in the pipeline has output all of its output. A simple alternative is to break the pipeline into separate commands and to simply emit one progress message before each command, announcing the next stage of processing (rather than per-object progress) as shown in Theo's answer.
Generally, there is no way to show the progress of command-internal processing, only the progress of a command's (multi-object) output.
The simplest way to do this via a ForEach-Object call in which you call
Write-Progress, but that comes with two challenges:
In order to show a percent-complete progress bar, you need to know how many objects there will be in total, which you must determine ahead of time, because a pipeline cannot know how many objects it will receive; your only option is to collect all output first (or find some other way to count it) and then use the collected output as pipeline input, using the count of objects as the basis for calculating the value to pass to Write-Progress -PerCentComplete.
Calling Write-Progress for each object received will result in a significant slowdown of overall processing; a compromise is to only call it for every N objects, as shown in this answer; the approach there could be wrapped in a properly implemented function a la Show-Progress that requires passing the total object count as an argument and performs proper streaming input-object processing (via a process block); that said, the mere act of using PowerShell code for passing input objects through is costly.
Conclusion:
Percent-complete progress displays have two inherent problems:
They require you to know the total number of objects to process beforehand (a pipeline has no way of knowing how many objects will pass through it):
Either: Collect all objects to process in memory, beforehand, if feasible; the count of elements in the collection can then serve as the basis for the percent-complete calculations. This may not be an option with very large input sets.
Or: Perform an extra processing step beforehand that merely counts all objects without actually retrieving them. This may not be practical in terms of the additional processing time added.
Object-by-object processing in PowerShell code - either via ForEach-Object or an advanced script/function - is inherently slow.
You can mitigate that somewhat by limiting Write-Progress calls to every N objects, as shown in this answer
Overall it's a tradeoff between processing speed and the ability to show percent-complete progress to the end user.

Related

Reading from and appending to CSV or XLSX in Powershell

For the last few years I've been using Powershell in my limited capacity to perform lookups in Active directory to fill in information that's missing from a lot of different reports. It usually involves get-aduser to find company, location, account status, manager, and a handful of other properties. What I'll do is take a report that's been handed to my team that is ALWAYS lacking key info and is usually anywhere from 500 to 5000 rows, export the userID (or whatever property I'm searching by) to a text file and reference that in a foreach loop to get the info I need. I'll then export that data to CSV, copy it to the original report and make sure whatever property I'm searching on matches at the top middle and bottom of the sheet then delete what's extra and go from there.
For reference, here's what I'm using now.
$TotalRows = (get-content '[path to file]').Length
foreach($EmployeeID in Get-Content '[path to file]') {
Write-Progress -Activity 'Processing IDs' -Status 'Searching Active Directory' -PercentComplete ((($rowcounter++) / $TotalRows) * 100)
Get-ADUser -Filter {EmployeeID -eq $EmployeeID} -Properties * | Select-Object EmployeeID,SamAccountName,UserPrincipalName, company, l, enabled, department, title, manager | Export-Csv -NoTypeInformation -Path '[path to output directory]\output.csv' -Append
}
This does a good job of things, but it requires a lot of copy/paste work and checking to make sure things still line up.
The ask... I want to be able to take all the garbage work out of the middle. I want to take a csv, tell Powershell to look in, lets say, E7 for a UPN, execute a get-aduser query then export the results to E8 and beyond based on how many properties I'm returning. The script would then chew through the rest of the sheet and export get-aduser info for each row.

Script has two variables when done, but when I pipe to SELECT-object only first one returns data to console

I am trying to query multiple servers with WMI, but I don't always have access to the servers.
The code is below. Alas, it returns "access is denied" to the console, but I can't seem to get rid of it. Oh well.
However, I am trapping the servers that I can't connect to, so that I can tell someone else to look at them, or request access.
But when I run the code, it only returns the first list of servers; even if $failed_servers has values, nothing is returned. If I tell both to pipe to ogv, then two windows pop up.
Why won't both "$variable|select" work? If I remove the select on $failed_servers, then it shows up, albeit just sitting immediately underneath the successful ones. Which is okay-but-not-great.
$list = ("servera","serverb","serverc")
$failed_servers = #()
$final = foreach ($server_instance in $list)
{
$errors=#()
gwmi -query "select * from win32_service where name like '%SQLSERVER%'" -cn $server_instance -ErrorVariable +errors -ErrorAction SilentlyContinue
if ($errors.Count -gt 0) {$failed_servers += $server_instance
}
}
$final|select pscomputername, name, startmode, state |where {$_.pscomputername -ne $null}
$failed_servers |select #{N='Failed Servers'; E={$_}}
What you're experiencing is merely a display problem:
Both your Select-Object calls produce output objects with 4 or fewer properties whose types do not have explicit formatting data associated with them (as reported by Get-FormatData).
This causes PowerShell's for-display output formatting system to implicitly render them via the Format-Table cmdlet.
The display columns that Format-Table uses are locked in based on the properties of the very first object that Format-Table receives.
Therefore, your second Select-Object call, whose output objects share no properties with the objects output by the first one, effectively produces no visible output - however, the objects are sent to the success output stream and are available for programmatic processing.
A simple demonstration:
& {
# This locks in Month and Year as the display columns of the output table.
Get-Date | Select-Object Month, Year
# This command's output will effectively be invisible,
# because the property set Name, Attributes does not overlap with
# Month, Year
Get-Item \ | Select-Object Name, Attributes
}
The output will look something like this - note how the second statement's output is effectively invisible (save for an extra blank line):
Month Year
----- ----
9 2021
Note the problem can even affect a single statement that outputs objects of disparate types (whose types don't have associated formatting data); e.g.:
(Get-Date | Select-Object Year), (Get-Item \ | Select-Object Name)
Workarounds:
Applying | Format-List to the command above makes all objects visible, though obviously changes the display format.
Intra-script you could pipe each Select-Object pipeline to Out-Host to force instant, pipeline-specific formatting, but - given that the results are sent directly to the host rather than to the success output stream - this technique precludes further programmatic processing.
Potential future improvements:
GitHub issue #7871 proposes at least issuing a warning if output objects effectively become invisible.

Slow Get-ADUser query

Something I do not unterstand. See the following two code examples.
$LDAPResult1 = Get-ADUser -LDAPFilter "(&(objectCategory=user)(sAMAccountName=*))" -Properties #("distinguishedName","sAMAccountName","extensionAttribute13") -SearchBase "ou=test,dc=test,dc=ch"
$LDAPElements1=#{}
$LDAPResult1 |% {$LDAPElements1.Add($_.SAMAccountName, $_.extensionattribute13)}
compared with (adding a specific server to ask "-Server 'dc.test.test.ch'"):
$LDAPResult1 = Get-ADUser -LDAPFilter "(&(objectCategory=user)(sAMAccountName=*))" -Properties #("distinguishedName","sAMAccountName","extensionAttribute13") -SearchBase "ou=test,dc=test,dc=ch" -Server 'dc.test.test.ch'
$LDAPElements1=#{}
$LDAPResult1 |% {$LDAPElements1.Add($_.SAMAccountName, $_.extensionattribute13)}
The first code takes 30 seconds, the second about 5 minutes. The problem ist not the AD query. This takes around 30 seconds in both cases. But filling the result into the hash table is what is differnet. It seems as if in the second case while filling the hash sill some data is requested from the DC.
What is also interesting. When I wait for five minutes after doing the AD query in case two and then execute the filling into the hash table, then the command takes a second.
I rather would likt to define to what server the command connects in order to execute the folloing commands on the same DC, but this does not make sense if it takes that long.
Can anyone enlighten me …
Addition: We are Talking about 26'000 accounts.
I was able to replicate this. The behaviour does change when you specify the -Server parameter vs. when you don't.
I used Process Monitor to watch network activity and it definitely is talking to the DC when looping through the results returned from using the -Server parameter.
I can't explain why, but it seems like the ADUser objects returned are not populated with the properties from the search. So when they are accessed, it loads the properties from the DC. I could see this when accessing one particular element in the array:
$LDAPResults1[1000]
It displayed the properties, but I also saw network activity in Process Monitor. Whereas I do not see network activity when accessing one element from the results returned when not using the -Server parameter.
So that kind of explains what is happening, but not why. And I really don't know why.
However, I have learned that if you want performance when talking to AD, you have to scrap all the "easy" ways and do things yourself. For example, use the .NET DirectoryEntry and DirectorySearcher classes directly, which can be done in PowerShell using the "type accelerators" [adsi] and [adsisearcher]. For example, this will do the same and will perform consistently:
$dc = "dc.test.test.ch"
$searchBase = "ou=test,dc=test,dc=ch"
$searcher = [adsisearcher]::new([adsi]"LDAP://$dc/$searchBase", "(objectCategory=user)")
$searcher.PropertiesToLoad.Add("sAMAccountName") > $null
$searcher.PropertiesToLoad.Add("extensionAttribute13") > $null
$searcher.PageSize = 1000
$LDAPElements1=#{}
foreach ($result in $searcher.FindAll()) {
$LDAPElements1.Add($result.Properties["sAMAccountName"][0], $result.Properties["extensionAttribute13"][0])
}
I found the following code to be extremely slow.
$user = Get-ADUser -LDAPFilter $filter -Server "xyc" -Properties "sAMAccountName"
I was able to rewrite it as:
$directorySearcher = New-Object System.DirectoryServices.DirectorySearcher
$directorySearcher.SearchRoot = [ADSI]'LDAP://xyz'
[void]$directorySearcher.PropertiesToLoad.Add('cn')
[void]$directorySearcher.PropertiesToLoad.Add('sAMAccountName')
$directorySearcher.Filter = "(cn=abcd efg)"
$results = $directorySearcher.FindOne()
Write-Host $results.Properties["samaccountname"] -as [String]
and it was a lot faster (by an order of magnitude) than using GetAd-User (but still slow).
Export Get-ADUser results into a temporary CSV file and import it back to some objects.
Get-ADUser -LDAPFilter (....) | Export-Csv -Path "TempCSV.csv" -Encoding UTF8 -Delimiter ","
$ADUsers = Import-Csv -Path "TempCSV.csv" -Encoding UTF8 -Delimiter ","
Now you can loop the users object.
foreach ($ADUser in $ADUsers) { (....) }

pipelined Powershell cmdlet showing partial result?

I am writing some C# program that executes PowerShell script.
I have the following line
Get-Mailbox -ResultSize:unlimited |
Get-MailboxPermission |
Where {($_.IsInherited -eq $false) -and !($_.user -like "S-1*") -and !($_.user -like "NT A*") } |
select identity,user,#{n="objectid";e={(get-recipient -identity $_.user).ExternalDirectoryObjectId}}
basically it finds all mailbox permissions and retrieves corresponding ExternalDirectoryObjectId (which is same as Azure ObjectID)
The issue here is that the result returned is different from different machines. I would get all identity, user values, but for expression values that are in bold above, will only start to show up half way through the execution.
for example on computer x
Identity|User|objectid
user1 |userA|
user2 |userA|
user2 |userB|
... |... |
user10|userC|
user11|userC|<objectID1>
user11|userD|<objectID2>
I noticed that on fast computer the objectIDs start showing up late, on slower computers it starts showing up early, however execution times are different.
How do I modify this so that objectGuid is retrieved for all entries? Why is pipelining not waiting until the calculated property objectID is properly retrieved?
If I write a short PowerShell script and use for loops for each mboxpermissions and retrieve them one by one, all of those objectGuids are retrievable. But it's slow.
Thanks for help and Please give me any suggestions!

PowerShell Office 365 Script to get user and mailbox information together

I am brand new to PowerShell (started this morning). I have successfully connected to my Office 365 and have been able to get lists of users from Office 365 and mailbox fields from the Exchange portion. What I can't figure out is how to combine them.
What I am looking for is the ability to export certain fields from the mailbox object but only for those mailboxes that belong to a non-blocked, licensed Office 365 users. We have a lot of users whose mailboxes have not been removed but they may no longer be licensed or they may be blocked.
Here are the two exports I have running now. They are complete exports. I tried to filter to the Office 265 users by isLicensed but I never got any results so I just downloaded everything and post processed them with Excel. But I need to run this on a regular basis...
Here's the code:
Get-Mailbox -ResultSize Unlimited | Select-Object DisplayName,Name,PrimarySMTPAddress,CustomAttribute2 | Export-CSV C:\temp\o365\mailboxes.csv
Get-MsolUser -all | Select-Object SignInName, DisplayName, Office, Department, Title, IsLicensed | export-csv c:\temp\o365\Users.csv
Any assistance would be appreciated.
Okay, so as I understand what you're trying to do... You want to get a list of all O365 users for whom the IsLicensed property is $true and the BlockCredential property is $false. Of those users, you then want to pull some data from their mailbox objects; DisplayName, Name, PrimarySMTPAddress, and CustomAttribute2.
There are a couple of ways that we can do this. The first is easier to throw together in the shell but takes longer to actually run. The second requires some set up but completes quickly.
First method
Since we know what our criteria is for what we want from Get-MsolUser, we'll use the pipeline to pull out what we want and toss it straight into Get-Mailbox.
Get-MsolUser -All | Where-Object {$_.IsLicensed -eq $true -and $_.BlockCredential -eq $false} |
Select-Object UserPrincipalName |
ForEach-Object {Get-Mailbox -Identity $_.UserPrincipalName | Select-Object DisplayName,Name,PrimarySMTPAddress,CustomAttribute2}
O365 PowerShell doesn't like giving us ways to filter our initial query, so we handle that in the second step, here...
Where-Object {$_.IsLicensed -eq $true -and $_.BlockCredential -eq $false}
This means, for each item passed in from Get-MsolUser -All, we only want those which have the properties Islicensed set to $true and BlockCredential set to $false.
Now, we only care about the results of Get-MsolUser in so far as determining what mailboxes to look up, so we'll grab a single property from each of the objects matching our prior filter.
Select-Object UserPrincipalName
If you only run everything up to this point, you'd get a list of UPNs in your shell for all of the accounts that we're now going to pipe into Get-Mailbox.
Moving on to our loop in this... If you haven't learned ForEach-Object yet, it's used to run a scriptblock (everything between the {}) against each item in the pipeline, one at a time.
Get-Mailbox -Identity $_.UserPrincipalName
Welcome to the pipeline operator ($_). Our previous Select-Object is feeding a collection of objects through the pipeline and this placeholder variable will hold each one as we work on them. Since these objects all have a UserPrincipalName property, we reference that for the value to pass to the Identity parameter of Get-Mailbox.
Sidebar
Here's a simple example of how this works..
PS> 1,2,3 | ForEach-Object {Write-Host $_}
1
2
3
Each item is passed along the pipeline, where we write them out one at a time. This is very similar to your standard foreach loop. You can learn more about their differences in this Scripting Guy post.
Moving on...
Select-Object DisplayName,Name,PrimarySMTPAddress,CustomAttribute2
We wrap it up with one last Select-Object for the information that you want. You can then look at the results in the shell or pipe into Export-Csv for working with in Excel.
Now... Since the pipeline works sequentially, there's some overhead to it. We're running a command, collecting the results, and then passing those results into the next command one at a time. When we get to our Get-Mailbox, we're actually running Get-Mailbox once for every UPN that we've collected. This takes about 2.5 minutes in my organization and we have less than 500 mailboxes. If you're working with larger numbers, the time to run this can grow very quickly.
Second method
Since a large amount of the processing overhead in the first method is with using the pipeline, we can eliminate most of it by handling our data collection as early and thoroughly as possible.
$Users = Get-MsolUser -All | Where-Object {$_.IsLicensed -eq $true -and $_.BlockCredential -eq $false} | Select-Object -ExpandProperty UserPrincipalName
$Mailboxes = Get-Mailbox | Select-Object UserPrincipalName,DisplayName,Name,PrimarySMTPAddress,CustomAttribute2
$Results = foreach ($User in $Users) {
$Mailboxes | Where-Object UserPrincipalName -eq $User
}
$Results | Export-Csv myFile.csv
The first 2 lines are pretty self-explanatory. We get all the user account info that we care about (just the UPNs) and then we grab all the mailbox properties that we care about.
foreach ($User in $Users)
Each entry in $Users will be stored in $User, where we'll then use it in the scriptblock that follows (in the {}).
$Mailboxes | Where-Object UserPrincipalName -eq $User
Each item in $Mailboxes is piped into Where-Object where we then check if the UserPrincipalName property is equal to the current value of $User. All of the matches are then stored in $Results, which can again be piped to Export-Csv for work in Excel.
While this method is harder to write out in the shell, and requires a little extra initial set up, it runs significantly faster; 22 seconds for my org, versus 2.5 minutes with the first method.
I should also point out that the use of UserPrincipalName with the mailbox dataset is simply to help ensure a solid match between those and the account dataset. If you don't want it in your final results, you can always pipe $Results into another Select-Object and specify only the properties that you care about.
Hope this helps!