Compare more than two strings - powershell

Here is what I am trying to achieve...
I have to view the ADAM db in VMWARE to see the replication times. My question is how would I compare more than two strings using the compare-object command. I cannot find any articles on more than two values.
This is what I started writing. I am trying to make this as dynamic as possible...
#PORT FOR LDAP
$ldap = 389;
#PATH
$path = 'DC=vdi,DC=vmware,DC=int';
#SERVERS
$vm = #("fqdn" , "fqdn" , "fqdn");
#ARRAY FOR LOOP
$comp = #();
#LOOP FOR ARRAY COMPARE
for($i = 1; $i -le $vm.count; $i++)
{
$comp += repadmin.exe /showrepl $svr":"$ldap $path | Select-String "Last attempt";
}
#CREATE DYNAMIC VARIABLES
for($i = 0; $i -le ($comp.count - 1); $i++)
{
New-Variable -name repl$i -Value $comp[$i];
}
Thank you in advanced!!!

As I mentioned in my comment, your question is too vague for us to provide a good answer for your situation, so I'll focus on "compare more than two strings". To do this, I wuold recommend Group-Object. Ex.
$data = #"
==== INBOUND NEIGHBORS ======================================
CN=Configuration,CN={B59C1E29-972F-455A-BDD5-1FA7C1B7D60D}
....
Last attempt # 2010-05-28 07:29:34 was successful.
CN=Schema,CN=Configuration,CN={B59C1E29-972F-455A-BDD5-1FA7C1B7D60D}
....
Last attempt # 2010-05-28 07:29:34 was successful.
OU=WSFG,DC=COM
....
Last attempt # 2010-05-28 07:29:35 failed, result -2146893008
(0x8009033
0):
"# -split [environment]::NewLine
$comp = $data | Select-String "Last attempt"
$comp | Group-Object
Count Name Group
----- ---- -----
2 Last attempt # 2010-05-28 07:29:34 was successful. { Last atte...
1 Last attempt # 2010-05-28 07:29:35 failed, result -2146893008 { Last atte...
Group-Object and PowerShell is very flexible, so you could customize this to ex. display the servernames and status for the servers that wasn't equal to the rest (ex. count = 1 or not in any of the biggest groups) etc., but I won't spend more time going into details because I have no idea of what you are trying to achieve, so I'll probably just waste both of ours time.
Summary: What I can tell you is the I would proabably (99% sure) use Group-Object to "compare more than two strings".

Related

How to speed up processing of ~million lines of text in log file

I am trying to parse a very large log file that consists of space delimited text across about 16 fields. Unfortunately the app logs a blank line in between each legitimate one (arbitrarily doubling the lines I must process). It also causes fields to shift because it uses space as both a delineator as well as for empty fields. I couldn't get around this in LogParser. Fortunately Powershell affords me the option to reference fields from the end as well making it easier to get later fields affected by shift.
After a bit of testing with smaller sample files, I've determined that processing line by line as the file is streaming with Get-Content natively is slower than just reading the file completely using Get-Content -ReadCount 0 and then processing from memory. This part is relatively fast (<1min).
The problem comes when processing each line, even though it's in memory. It is taking hours for a 75MB file with 561178 legitimate lines of data (minus all the blank lines).
I'm not doing much in the code itself. I'm doing the following:
Splitting line via space as delimiter
One of the fields is an IP address that I am reverse DNS resolving, which is obviously going to be slow. So I have wrapped this into more code to create an in-memory arraylist cache of previously resolved IPs and pulling from it when possible. The IPs are largely the same so after a few hundred lines, resolution shouldn't be an issue any longer.
Saving the needed array elements into my pscustomobject
Adding pscustomobject to arraylist to be used later.
During the loop I'm tracking how many lines I've processed and outputting that info in a progress bar (I know this adds extra time but not sure how much). I really want to know progress.
All in all, it's processing some 30-40 lines per second, but obviously this is not very fast.
Can someone offer alternative methods/objectTypes to accomplish my goals and speed this up tremendously?
Below are some samples of the log with the field shift (Note this is a Windows DNS Debug log) as well as the code below that.
10/31/2022 12:38:45 PM 2D00 PACKET 000000B25A583FE0 UDP Snd 127.0.0.1 6c94 R Q [8385 A DR NXDOMAIN] AAAA (4)pool(3)ntp(3)org(0)
10/31/2022 12:38:45 PM 2D00 PACKET 000000B25A582050 UDP Snd 127.0.0.1 3d9d R Q [8081 DR NOERROR] A (4)pool(3)ntp(3)org(0)
NOTE: the issue in this case being [8385 A DR NXDOMAIN] (4 fields) vs [8081 DR NOERROR] (3 fields)
Other examples would be the "R Q" where sometimes it's " Q".
$Logfile = "C:\Temp\log.txt"
[System.Collections.ArrayList]$LogEntries = #()
[System.Collections.ArrayList]$DNSCache = #()
# Initialize log iteration counter
$i = 1
# Get Log data. Read entire log into memory and save only lines that begin with a date (ignoring blank lines)
$LogData = Get-Content $Logfile -ReadCount 0 | % {$_ | ? {$_ -match "^\d+\/"}}
$LogDataTotalLines = $LogData.Length
# Process each log entry
$LogData | ForEach-Object {
$PercentComplete = [math]::Round(($i/$LogDataTotalLines * 100))
Write-Progress -Activity "Processing log file . . ." -Status "Processed $i of $LogDataTotalLines entries ($PercentComplete%)" -PercentComplete $PercentComplete
# Split line using space, including sequential spaces, as delimiter.
# NOTE: Due to how app logs events, some fields may be blank leading split yielding different number of columns. Fortunately the fields we desire
# are in static positions not affected by this, except for the last 2, which can be referenced backwards with -2 and -1.
$temp = $_ -Split '\s+'
# Resolve DNS name of IP address for later use and cache into arraylist to avoid DNS lookup for same IP as we loop through log
If ($DNSCache.IP -notcontains $temp[8]) {
$DNSEntry = [PSCustomObject]#{
IP = $temp[8]
DNSName = Resolve-DNSName $temp[8] -QuickTimeout -DNSOnly -ErrorAction SilentlyContinue | Select -ExpandProperty NameHost
}
# Add DNSEntry to DNSCache collection
$DNSCache.Add($DNSEntry) | Out-Null
# Set resolved DNS name to that which came back from Resolve-DNSName cmdlet. NOTE: value could be blank.
$ResolvedDNSName = $DNSEntry.DNSName
} Else {
# DNSCache contains resolved IP already. Find and Use it.
$ResolvedDNSName = ($DNSCache | ? {$_.IP -eq $temp[8]}).DNSName
}
$LogEntry = [PSCustomObject]#{
Datetime = $temp[0] + " " + $temp[1] + " " + $temp[2] # Combines first 3 fields Date, Time, AM/PM
ClientIP = $temp[8]
ClientDNSName = $ResolvedDNSName
QueryType = $temp[-2] # Second to last entry of array
QueryName = ($temp[-1] -Replace "\(\d+\)",".") -Replace "^\.","" # Last entry of array. Replace any "(#)" characters with period and remove first period for friendly name
}
# Add LogEntry to LogEntries collection
$LogEntries.Add($LogEntry) | Out-Null
$i++
}
Here is a more optimized version you can try.
What changed?:
Removed Write-Progress, especially because it's not known if Windows PowerShell is used. PowerShell versions below 6 have a big performance impact with Write-Progress
Changed $DNSCache to Generic Dictionary for fast lookups
Changed $LogEntries to Generic List
Switched from Get-Content to switch -Regex -File
$Logfile = 'C:\Temp\log.txt'
$LogEntries = [System.Collections.Generic.List[psobject]]::new()
$DNSCache = [System.Collections.Generic.Dictionary[string, psobject]]::new([System.StringComparer]::OrdinalIgnoreCase)
# Process each log entry
switch -Regex -File ($Logfile) {
'^\d+\/' {
# Split line using space, including sequential spaces, as delimiter.
# NOTE: Due to how app logs events, some fields may be blank leading split yielding different number of columns. Fortunately the fields we desire
# are in static positions not affected by this, except for the last 2, which can be referenced backwards with -2 and -1.
$temp = $_ -Split '\s+'
$ip = [string] $temp[8]
$resolvedDNSRecord = $DNSCache[$ip]
if ($null -eq $resolvedDNSRecord) {
$resolvedDNSRecord = [PSCustomObject]#{
IP = $ip
DNSName = Resolve-DnsName $ip -QuickTimeout -DnsOnly -ErrorAction Ignore | select -ExpandProperty NameHost
}
$DNSCache[$ip] = $resolvedDNSRecord
}
$LogEntry = [PSCustomObject]#{
Datetime = $temp[0] + ' ' + $temp[1] + ' ' + $temp[2] # Combines first 3 fields Date, Time, AM/PM
ClientIP = $ip
ClientDNSName = $resolvedDNSRecord.DNSName
QueryType = $temp[-2] # Second to last entry of array
QueryName = ($temp[-1] -Replace '\(\d+\)', '.') -Replace '^\.', '' # Last entry of array. Replace any "(#)" characters with period and remove first period for friendly name
}
# Add LogEntry to LogEntries collection
$LogEntries.Add($LogEntry)
}
}
If it's still slow, there is still the option to use Start-ThreadJob as a multithreading approach with chunked lines (like 10000 per job).

Windows PowerShell: How to parse the log file?

I have an input file with below contents:
27/08/2020 02:47:37.365 (-0516) hostname12 ult_licesrv ULT 5 LiceSrv Main[108 00000 Session 'session1' (from 'vmpms1\app1#pmc21app20.pm.com') request for 1 additional licenses for module 'SA-XT' - 1 licenses have been allocated by concurrent usage category 'Unlimited' (session module usage now 1, session category usage now 1, total module concurrent usage now 1, total category usage now 1)
27/08/2020 02:47:37.600 (-0516) hostname13 ult_licesrv ULT 5 LiceSrv Main[108 00000 Session 'sssion2' (from 'vmpms2\app1#pmc21app20.pm.com') request for 1 additional licenses for module 'SA-XT-Read' - 1 licenses have been allocated by concurrent usage category 'Floating' (session module usage now 2, session category usage now 2, total module concurrent usage now 1, total category usage now 1)
27/08/2020 02:47:37.115 (-0516) hostname141 ult_licesrv CMN 5 Logging Housekee 00000 Deleting old log file 'C:\Program Files\PMCOM Global\License Server\diag_ult_licesrv_20200824_011130.log.gz' as it exceeds the purge threashold of 72 hours
27/08/2020 02:47:37.115 (-0516) hostname141 ult_licesrv CMN 5 Logging Housekee 00000 Deleting old log file 'C:\Program Files\PMCOM Global\License Server\diag_ult_licesrv_20200824_021310.log.gz' as it exceeds the purge threashold of 72 hours
27/08/2020 02:47:37.625 (-0516) hostname150 ult_licesrv ULT 5 LiceSrv Main[108 00000 Session 'session1' (from 'vmpms1\app1#pmc21app20.pm.com') request for 1 additional licenses for module 'SA-XT' - 1 licenses have been allocated by concurrent usage category 'Unlimited' (session module usage now 2, session category usage now 1, total module concurrent usage now 2, total category usage now 1)
I need to generate and output file like below:
Date,time,hostname,session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage
27/08/2020,02:47:37.365 (-0516),hostname12,1,1,1,1
27/08/2020,02:47:37.600 (-0516),hostname13,2,2,1,1
27/08/2020,02:47:37.115 (-0516),hostname141,0,0,0,0
27/08/2020,02:47:37.115 (-0516),hostname141,0,0,0,0
27/08/2020,02:47:37.625 (-0516),hostname150,2,1,2,1
The output data order is: Date,time,hostname,session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage.
Put 0,0,0,0 if no entry for session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage
I need to get content from the input file and write the output to another file.
Update
I have created a file input.txt in F drive and pasted the log details into it.
Then I form an array by splitting the file content when a new line occurs like below.
$myList = (Get-Content -Path F:\input.txt) -split '\n'
Now I got 5 items in my array myList. Then I replace the multiple blank spaces with a single blank space and formed a new array by splitting each element by blank space. Then I print the 0 to 3 array elements. Now I need to add the end values (session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage).
PS C:\Users\user> $myList = (Get-Content -Path F:\input.txt) -split '\n'
PS C:\Users\user> $myList.Length
5
PS C:\Users\user> $myList = (Get-Content -Path F:\input.txt) -split '\n'
PS C:\Users\user> $myList.Length
5
PS C:\Users\user> for ($i = 0; $i -le ($myList.length - 1); $i += 1) {
>> $newList = ($myList[$i] -replace '\s+', ' ') -split ' '
>> $newList[0]+','+$newList[1]+' '+$newList[2]+','+$newList[3]
>> }
27/08/2020,02:47:37.365 (-0516),hostname12
27/08/2020,02:47:37.600 (-0516),hostname13
27/08/2020,02:47:37.115 (-0516),hostname141
27/08/2020,02:47:37.115 (-0516),hostname141
27/08/2020,02:47:37.625 (-0516),hostname150
If you really need to filter on the granularity that you're looking for, then you may need to use regex to filter the lines.
This would assume that the rows have similarly labeled lines before the values you're looking for, so keep that in mind.
[System.Collections.ArrayList]$filteredRows = #()
$log = Get-Content -Path C:\logfile.log
foreach ($row in $log) {
$rowIndex = $log.IndexOf($row)
$date = ([regex]::Match($log[$rowIndex],'^\d+\/\d+\/\d+')).value
$time = ([regex]::Match($log[$rowIndex],'\d+:\d+:\d+\.\d+\s\(\S+\)')).value
$hostname = ([regex]::Match($log[$rowIndex],'(?<=\d\d\d\d\) )\w+')).value
$sessionModuleUsage = ([regex]::Match($log[$rowIndex],'(?<=session module usage now )\d')).value
if (!$sessionModuleUsage) {
$sessionModuleUsage = 0
}
$sessionCategoryUsage = ([regex]::Match($log[$rowIndex],'(?<=session category usage now )\d')).value
if (!$sessionCategoryUsage) {
$sessionCategoryUsage = 0
}
$moduleConcurrentUsage = ([regex]::Match($log[$rowIndex],'(?<=total module concurrent usage now )\d')).value
if (!$moduleConcurrentUsage) {
$moduleConcurrentUsage = 0
}
$totalCategoryUsage = ([regex]::Match($log[$rowIndex],'(?<=total category usage now )\d')).value
if (!$totalCategoryUsage) {
$totalCategoryUsage = 0
}
$hash = [ordered]#{
Date = $date
time = $time
hostname = $hostname
session_module_usage = $sessionModuleUsage
session_category_usage = $sessionCategoryUsage
module_concurrent_usage = $moduleConcurrentUsage
total_category_usage = $totalCategoryUsage
}
$rowData = New-Object -TypeName 'psobject' -Property $hash
$filteredRows.Add($rowData) > $null
}
$csv = $filteredRows | convertto-csv -NoTypeInformation -Delimiter "," | foreach {$_ -replace '"',''}
$csv | Out-File C:\results.csv
What essentially needs to happen is that we need to get-content of the log, which returns an array with each item terminated on a newline.
Once we have the rows, we need to grab the values via regex
Since you want zeroes in some of the items if those values don't exist, I have if statements that assign '0' if the regex returns nothing
Finally, we add each filtered item to a PSObject and append that object to an array of objects in each iteration.
Then export to a CSV.
You can probably pick apart the lines with a regex and substrings easily enough. Basically something like the following:
# Iterate over the lines of the input file
Get-Content F:\input.txt |
ForEach-Object {
# Extract the individual fields
$Date = $_.Substring(0, 10)
$Time = $_.Substring(12, $_.IndexOf(')') - 11)
$Hostname = $_.Substring(34, $_.IndexOf(' ', 34) - 34)
$session_module_usage = 0
$session_category_usage = 0
$module_concurrent_usage = 0
$total_category_usage = 0
if ($_ -match 'session module usage now (\d+), session category usage now (\d+), total module concurrent usage now (\d+), total category usage now (\d+)') {
$session_module_usage = $Matches[1]
$session_category_usage = $Matches[2]
$module_concurrent_usage = $Matches[3]
$total_category_usage = $Matches[4]
}
# Create custom object with those properties
New-Object PSObject -Property #{
Date = $Date
time = $Time
hostname = $Hostname
session_module_usage = $session_module_usage
session_category_usage = $session_category_usage
module_concurrent_usage = $module_concurrent_usage
total_category_usage = $total_category_usage
}
} |
# Ensure column order in output
Select-Object Date,time,hostname,session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage |
# Write as CSV - without quotes
ConvertTo-Csv -NoTypeInformation |
ForEach-Object { $_ -replace '"' } |
Out-File F:\output.csv
Whether to pull the date, time, and host name from the line with substrings or regex is probably a matter of taste. Same goes for how strict the format must be matched, but that to me mostly depends on how rigid the format is. For more free-form things where different lines would match different regexes, or multiple lines makes up a single record, I also quite like switch -Regex to iterate over the lines.

PowerShell script to group records by overlapping start and end date

I am working on a CSV file which have start and end date and the requirement is group records by dates when the dates overlap each other.
For example, in below table Bill_Number 177835 Start_Date and End_Date is overlapping with 178682,179504, 178990 Start_Date and End_Date so all should be grouped together and so on for each and every record.
Bill_Number,Start_Date,End_Date
177835,4/14/20 3:00 AM,4/14/20 7:00 AM
178682,4/14/20 3:00 AM,4/14/20 7:00 AM
179504,4/14/20 3:29 AM,4/14/20 6:29 AM
178662,4/14/20 4:30 AM,4/14/20 5:30 AM
178990,4/14/20 6:00 AM,4/14/20 10:00 AM
178995,4/15/20 6:00 AM,4/15/20 10:00 AM
178998,4/15/20 6:00 AM,4/15/20 10:00 AM
I have tried different combination like "Group-by" and "for loop" but not able to produce result.
With the above example of CSV, the expected result is;
Group1: 177835,178682,179504, 178990
Group2: 177835,178682,179504, 178662
Group3: 178995, 178998
Currently i have below code in hand.
Any help on this will be appreciated,thanks in advance.
$array = #(‘ab’,’bc’,’cd’,’df’)
for ($y = 0; $y -lt $array.count) {
for ($x = 0; $x -lt $array.count) {
if ($array[$y]-ne $array[$x]){
Write-Host $array[$y],$array[$x]
}
$x++
}
$y++
}
You can do something like the following. There is likely a cleaner solution, but that could take a lot of time.
$csv = Import-Csv file.csv
# Creates all inclusive groups where times overlap
$csvGroups = foreach ($row in $csv) {
$start = [datetime]$row.Start_Date
$end = [datetime]$row.End_Date
,($csv | where { ($start -ge [datetime]$_.Start_Date -and $start -le [datetime]$_.End_Date) -or ($end -ge [datetime]$_.Start_Date -and $end -le [datetime]$_.End_Date) })
}
# Removes duplicates from $csvGroups
$groups = $csvGroups | Group {$_.Bill_number -join ','} |
Foreach-Object { ,$_.Group[0] }
# Compares current group against all groups except itself
$output = for ($i = 0; $i -lt $groups.count; $i++) {
$unique = $true # indicates if the group's bill_numbers are in another group
$group = $groups[$i]
$list = $groups -as [system.collections.arraylist]
$list.RemoveAt($i) # Removes self
foreach ($innerGroup in $list) {
# If current group's bill_numbers are in another group, skip to next group
if ((compare $group.Bill_Number $innergroup.Bill_Number).SideIndicator -notcontains '<=') {
$unique = $false
break
}
}
if ($unique) {
,$group
}
}
$groupCounter = 1
# Output formatting
$output | Foreach-Object { "Group{0}:{1}" -f $groupCounter++,($_.Bill_Number -join ",")}
Explanation:
I added comments to give an idea as to what is going on.
The ,$variable syntax uses the unary operator ,. It converts the output into an array. Typically, PowerShell unrolls an array as individual items. The unrolling becomes a problem here because we want the groups to stays as groups (arrays). Otherwise, there would be a lot of duplicate bill numbers, and we'd lose track between groups.
An arraylist is used for $list. This is so we can access the RemoveAt() method. A typical array is of fixed size and can't be manipulated in that fashion. This can effectively be done with an array, but the code is different. You either have to select the index ranges around the item you want to skip or create a new array using some other conditional statement that will exclude the target item. An arraylist is just easier for me (personal preference).
So a very dirty approach. I think there are a coup of ways to determine if there's overlap for a specific comparison, one record to another. However you may need a list of bill numbers each bill date range collides with. using a function call in a Select-Object statement/expression I added a collisions property to your objects.
The function is wordy and probably be improved, but the gist is that for each record it will compare to all other records and report that bill number in it's collision property if either the start or end date falls within the other records range.
This is of course just demo code, I'm sure it can be made better for your purposes, but may be a starting point for you.
Obviously change the path to the CSV file.
Function Get-Collisions
{
Param(
[Parameter(Mandatory = $true)]
[Object]$ReferenceObject,
[Parameter( Mandatory = $true )]
[Object[]]$CompareObjects
) # End Parameter Block
ForEach($Object in $CompareObjects)
{
If( !($ReferenceObject.Bill_Number -eq $Object.Bill_Number) )
{
If(
( $ReferenceObject.Start_Date -ge $Objact.StartDate -and $ReferenceObject.Start_Date -le $Objact.End_Date ) -or
( $ReferenceObject.End_Date -ge $Object.Start_Date -and $ReferenceObject.End_Date -le $Object.End_Date ) -or
( $ReferenceObject.Start_Date -le $Object.Start_Date -and $ReferenceObject.End_Date -ge $Object.Start_Date )
)
{
$Object.Bill_Number
}
}
}
} # End Get-Collisions
$Objects = Import-Csv 'C:\temp\DateOverlap.CSV'
$Objects |
ForEach-Object{
$_.Start_Date = [DateTime]$_.Start_Date
$_.End_Date = [DateTime]$_.End_Date
}
$Objects = $Objects |
Select-object *,#{Name = 'Collisions'; Expression = { Get-Collisions -ReferenceObject $_ -CompareObjects $Objects }}
$Objects | Format-Table -AutoSize
Let me know how it goes. Thanks.
#Shan , I saw your comments so I wanted to respond with some additional code and discussion. I may have gone overboard, but you expressed a desire to learn, such that you can maintain these code pieces in the future. So, I put a lot of time into this.
I may mention some of #AdminOfThings work too. That is not criticism, but collaboration. His example is clever and dynamic in terms of getting the job done and pulling in the right tools as he worked his way to the desired output.
I originally side-stepped the grouping question because I didn't feel like naming/numbering the groups had any meaning. For example: "Group 1" indicates all its members have overlap in their billing periods, but no indication of what or when the overlap is. Maybe I rushed through it… I may have been reading too much into it or perhaps even letting my own biases get in the way. At any rate, I elected to create a relationship from the perspective of each bill number, and that resulted in my first answer.
Since then, and because of your comment, I put effort into extending and documenting the first example I gave. The revised code will be Example 1 below. I've heavily commented it and most of the comments will apply to the original example as well. There are some differences that were forced by the extended grouping functionality, but the comments should reflect those situations.
Note: You'll also see I stopped calling them "collisions" and termed them "overlaps" instead.
Example 1:
Function Get-Overlaps
{
<#
.SYNOPSIS
Given an object (reference object) compare to a collection of other objects of the same
type. Return an array of billing numbers for which the billing period overlaps that of
the reference object.
.DESCRIPTION
Given an object (reference object) compare to a collection of other objects of the same
type. Return an array of billing numbers for which the billing period overlaps that of
the reference object.
.PARAMETER ReferenceObject
This is the current object you wish to compare to all other objects.
.PARAMETER
The collection of objects you want to compare with the reference object.
.NOTES
> The date time casting could probably have been done by further preparing
the objects in the calling code. However, givin this is for a
StackOverflow question I can polish that later.
#>
Param(
[Parameter(Mandatory = $true)]
[Object]$ReferenceObject,
[Parameter( Mandatory = $true )]
[Object[]]$CompareObjects
) # End Parameter Block
[Collections.ArrayList]$Return = #()
$R_StartDate = [DateTime]$ReferenceObject.Start_Date
$R_EndDate = [DateTime]$ReferenceObject.End_Date
ForEach($Object in $CompareObjects)
{
$O_StartDate = [DateTime]$Object.Start_Date
$O_EndDate = [DateTime]$Object.End_Date
# The first if statement skips the reference object's bill_number
If( !($ReferenceObject.Bill_Number -eq $Object.Bill_Number) )
{
# This logic can use some explaining. So far as I could tell there were 2 cases to look for:
# 1) Either or both the start and end dates fell inside the the timespan of the comparison
# object. This cases is handle by the first 2 conditions.
# 2) If the reference objects timespan covers the entire timespan of the comparison object.
# Meaning the start date is before and the end date is after, fitting the entire
# comparison timespan is within the bounds of the reference timespan. I elected to use
# the 3rd condition below to detect that case because once the start date is earlier I
# only have to care if the end date is greater than the start date. It's a little more
# inclusive and partially covered by the previous conditions, but whatever, you gotta
# pick something...
#
# Note: This was a deceptively difficult thing to comprehend, I missed that last condition
# in my first example (later corrected) and I think #AdminOfThings also overlooked it.
If(
( $R_StartDate -ge $O_StartDate -and $R_StartDate -le $O_EndDate ) -or
( $R_EndDate -ge $O_StartDate -and $R_EndDate -le $O_EndDate ) -or
( $R_StartDate -le $O_StartDate -and $R_EndDate -ge $O_StartDate )
)
{
[Void]$Return.Add( $Object.Bill_Number )
}
}
}
Return $Return
} # End Get-Overlaps
$Objects =
Import-Csv 'C:\temp\DateOverlap.CSV' |
ForEach-Object{
# Consider overlap as a relationship from the perspective of a given Object.
$Overlaps = [Collections.ArrayList]#(Get-overlaps -ReferenceObject $_ -CompareObjects $Objects)
# Knowing the overlaps I can infer the group, by adding the group's bill_number to its group property.
If( $Overlaps )
{ # Don't calculate a group unless you actually have overlaps:
$Group = $Overlaps.Clone()
[Void]$Group.Add( $_.Bill_Number ) # Can you do in the above line, but for readability I separated it.
}
Else { $Group = $null } # Ensure's not reusing group from a previous iteration of the loop.
# Create a new PSCustomObject with the data so far.
[PSCustomObject][Ordered]#{
Bill_Number = $_.Bill_Number
Start_Date = [DateTime]$_.Start_Date
End_Date = [DateTime]$_.End_Date
Overlaps = $Overlaps
Group = $Group | Sort-Object # Sorting will make it a lot easier to get unique lists later.
}
}
# The reason I recreated the objects from the CSV file instead of using Select-Object as I had
# previously is that I simply couldn't get Select-Object to maintain type ArrayList that was being
# returned from the function. I know that's a documented problem or circumstance some where.
# Now I'll add one more property called Group_ID a comma delimited string that we can later use
# to echo the groups according to your original request.
$Objects =
$Objects |
Select-Object *,#{Name = 'Group_ID'; Expression = { $_.Group -join ', ' } }
# This output is just for the sake of showing the new objects:
$Objects | Format-Table -AutoSize -Wrap
# Now create an array of unique Group_ID strings, this is possible of the sorts and joins done earlier.
$UniqueGroups = $Objects.Group_ID | Select-Object -Unique
$Num = 1
ForEach($UniqueGroup in $UniqueGroups)
{
"Group $Num : $UniqueGroup"
++$Num # Increment the $Num, using convienient unary operator, so next group is echoed properly.
}
# Below is a traditional for loop that does the same thing. I did that first before deciding the ForEach
# was cleaner. Leaving it commented below, because you're on a learning-quest, so just more demo code...
# For($i = 0; $i -lt $UniqueGroups.Count; ++$i)
# {
# $Num = $i + 1
# $UniqueGroup = $UniqueGroups[$i]
# "Group $Num : $UniqueGroup"
# }
Example 2:
$Objects =
Import-Csv 'C:\temp\DateOverlap.CSV' |
Select-Object Bill_Number,
#{ Name = 'Start_Date'; Expression = { [DateTime]$_.Start_Date } },
#{ Name = 'End_Date'; Expression = { [DateTime]$_.End_Date } }
# The above select statement converts the Start_Date & End_Date properties to [DateTime] objects
# While you had asked to pack everything into the nested loops, that would have resulted in
# unnecessary recasting of object types to ensure proper comparison. Often this is a matter of
# preference, but in this case I think it's better. I did have it working well without the
# above select, but the code is more readable / concise with it. So even if you treat the
# Select-Object command as a blackbox the rest of the code should be easier to understand.
#
# Of course, and if you couldn't tell from my samples Select-Object is incredibly useful. I
# recommend taking the time to learn it thoroughly. The MS documentation can be found here:
# https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/select-object?view=powershell-5.1
:Outer ForEach( $ReferenceObject in $Objects )
{
# In other revisions I had assigned these values to some shorter variable names.
# I took that out. Again since you're learning I wanted the all the dot referencing
# to be on full display.
$ReferenceObject.Start_Date = $ReferenceObject.Start_Date
$ReferenceObject.End_Date = $ReferenceObject.End_Date
[Collections.ArrayList]$TempArrList = #() # Reset this on each iteration of the outer loop.
:Inner ForEach( $ComparisonObject in $Objects )
{
If( $ComparisonObject.Bill_Number -eq $ReferenceObject.Bill_Number )
{ # Skip the current reference object in the $Objects collection! This prevents the duplication of
# the current Bill's number within it's group, helping to ensure unique-ification.
#
# By now you should have seen across all revision including AdminOfThings demo, that there was some
# need skip the current item when searching for overlaps. And, that there are a number of ways to
# accomplish that. In this case I simply go back to the top of the loop when the current record
# is encountered, effectively skipping it.
Continue Inner
}
# The below logic needs some explaining. So far as I could tell there were 2 cases to look for:
# 1) Either or both the start and end dates fell inside the the timespan of the comparison
# object. This cases is handle by the first 2 conditions.
# 2) If the reference object's timespan covers the entire timespan of the comparison object.
# Meaning the start date is before and the end date is after, fitting the entire
# comparison timespan is within the bounds of the reference timespan. I elected to use
# the 3rd condition below to detect that case because once the start date is earlier I
# only have to care if the end date is greater than the other start date. It's a little
# more inclusive and partially covered by the previous conditions, but whatever, you gotta
# pick something...
#
# Note: This was a deceptively difficult thing to comprehend, I missed that last condition
# in my first example (later corrected) and I think #AdminOfThings also overlooked it.
If(
( $ReferenceObject.Start_Date -ge $ComparisonObject.Start_Date -and $ReferenceObject.Start_Date -le $ComparisonObject.End_Date ) -or
( $ReferenceObject.End_Date -ge $ComparisonObject.Start_Date -and $ReferenceObject.End_Date -le $ComparisonObject.End_Date ) -or
( $ReferenceObject.Start_Date -le $ComparisonObject.Start_Date -and $ReferenceObject.End_Date -ge $ComparisonObject.Start_Date )
)
{
[Void]$TempArrList.Add( $ComparisonObject.Bill_Number )
}
}
# Now Add the properties!
$ReferenceObject | Add-Member -Name Overlaps -MemberType NoteProperty -Value $TempArrList
If( $ReferenceObject.Overlaps )
{
[Void]$TempArrList.Add($ReferenceObject.Bill_Number)
$ReferenceObject | Add-Member -Name Group -MemberType NoteProperty -Value ( $TempArrList | Sort-Object )
$ReferenceObject | Add-Member -Name Group_ID -MemberType NoteProperty -Value ( $ReferenceObject.Group -join ', ' )
# Below a script property also works, but I think the above is easier to follow:
# $ReferenceObject | Add-Member -Name Group_ID -MemberType ScriptProperty -Value { $this.Group -join ', ' }
}
Else
{
$ReferenceObject | Add-Member -Name Group -MemberType NoteProperty -Value $null
$ReferenceObject | Add-Member -Name Group_ID -MemberType NoteProperty -Value $null
}
}
# This output is just for the sake of showing the new objects:
$Objects | Format-Table -AutoSize -Wrap
# Now create an array of unique Group_ID strings, this is possible of the sorts and joins done earlier.
#
# It's important to point out I chose to sort because I saw the clever solution that AdminOfThings
# used. There's a need to display only groups that have unique memberships, not necessarily unique
# ordering of the members. He identified these by doing some additional loops and using the Compare
# -Object cmdlet. Again, I must say that was very clever, and Compare-Object is another tool very much
# worth getting to know. However, the code didn't seem like it cared which of the various orderings it
# ultimately output. Therefore I could conclude the order wasn't really important, and it's fine if the
# groups are sorted. With the objects sorted it's much easier to derive the truely unique lists with the
# simple Select-Object command below.
$UniqueGroups = $Objects.Group_ID | Select-Object -Unique
# Finally Loop through the UniqueGroups
$Num = 1
ForEach($UniqueGroup in $UniqueGroups)
{
"Group $Num : $UniqueGroup"
++$Num # Increment the $Num, using convienient unary operator, so next group is echoed properly.
}
Additional Discussion:
Hopefully the examples are helpful. I wanted to mentioned a few more points:
Using ArrayLists ( [System.Collections.ArrayList] ) instead of native arrays. The typical reason to do this is the ability to add and remove elements quickly. If you search the internet you'll find hundreds of articles explaining why it's faster. It's so common you'll often find experienced PowerShell users implementing it instinctively. But the main reason is speed and the flexibility to easily add and remove elements.
You'll notice I relied heavily on the ability to append new properties to objects. There are several ways to do this, Select-Object , Creating your own objects, and in Example 2 above I used Get-Member. The main reason I used Get-Member was I couldn't get the ArrayList type to stick when using Select-Object.
Regarding loops. This is specific to your desire for nested loops. My first answer still had loops, except some were implied by the pipe, and others were stored in a helper function. The latter is really also a preference; for readability it's sometimes helpful to park some code out of view from the main code body. That said, all the same concepts were there from the beginning. You should get comfortable with the implied loop that comes with pipe-lining capability.
I don't think there's much more I can say without getting redundant. I really hope this was helpful, it was certainly fun for me to work on it. If you have questions or feedback let me know. Thanks.

Which part of this Powershell code snippet is making it take a long time to run?

I'm tasked with making a report of the last logon time for each user in our AD env, I obviously first asked mother google for something that I could repurpose but couldn't find anything that would check multiple Domain Controllers and reconcile the last one, and then spit out if it was past an arbitrarily set date/number of days.
Here's the code:
foreach ($user in $usernames) {
$percentCmpUser = [math]::Truncate(($usernames.IndexOf($user)/$usernames.count)*100)
Write-Progress -Id 3 -Activity "Finding Inactive Accounts" -Status "$($percentCmpUser)% Complete:" -PercentComplete $percentCmpUser
$allLogons = $AllUsers | Where-Object {$_.SamAccountName -match $user}
$finalLogon = $allLogons| Sort-Object LastLogon -Descending |Select-Object -First 1
if ($finalLogon.LastLogon -lt $time.ToFileTime()) {
$inactiveAccounts += $finalLogon
}
}
$usernames is a list of about 6000 usernames
$AllUsers is a list of 18000 users, it includes 10 different properties that I'd like to have access to in my final report. The way I got it was by hitting three of our 20 or so DC's for all users in specific OUs that I'm concerned with. The final script will actually be 6k*20 bec I do need to hit every DC to make sure I don't miss any user's logon.
Here's how $time is calculated:
$DaysInactive = 60
$todayDate = Get-Date
$time = ($todayDate).Adddays(-($DaysInactive))
Each variable is used elsewhere in the script, which is why I break it out like that.
Before you suggest LastLogonTimestamp, I was told it's not current enough and when I asked about changing the replication time to be more current I was told "no, not gonna happen".
Search-ADAccount also doesn't seem to offer an accurate view of inactive users.
I'm open to all suggestions about how to make this specific snippet run faster or on how to use a different methodology to achieve the same result in a fast time.
As of now hitting each DC for all users in specific OUs takes about 10-20sec per DC and then the above snippet takes 30-40 min.
Couple of things stand out, but likely the biggest performance killer here is these two statements:
$percentCmpUser = [math]::Truncate(($usernames.IndexOf($user)/$usernames.count)*100)
# and
$allLogons = $AllUsers | Where-Object {$_.SamAccountName -match $user}
... both of these statements will exhibit O(N^2) (or quadratic) performance characteristics - that is, every time you double the input size, the time taken quadruples!
Array.IndexOf() is effectively a loop
Let's look at the first one:
$percentCmpUser = [math]::Truncate(($usernames.IndexOf($user)/$usernames.count)*100)
It might not be self-evident, but this method-call: $usernames.IndexOf() might require iterating through the entire list of $usernames every time it executes - by the time you reach the last $user, it needs to go through and compare $user all 6000 items.
Two ways you can address this:
Use a regular for loop:
for($i = 0; $i -lt $usernames.Count; $i++) {
$user = $usernames[$i]
$percent = ($i / $usernames.Count) * 100
# ...
}
Stop outputting progress altogether
Write-Progress is really slow - even if the caller suppresses Progress output (eg. $ProgressPreference = 'SilentlyContinue'), using the progress stream still carries overhead, especially when called in every loop iteration.
Removing Write-Progress altogether would remove the requirement for calculating percentage :)
If you still need to output progress information you can shave off some overhead by only calling Write-Progress sometimes - for example once every 100 iterations:
for($i = 0; $i -lt $usernames.Count; $i++) {
$user = $usernames[$i]
if($i % 100 -eq 0){
$percent = ($i / $usernames.Count) * 100
Write-Progress -Id 3 -Activity "Finding Inactive Accounts" -PercentComplete $percent
}
# ...
}
... |Where-Object is also just a loop
Now for the second one:
$allLogons = $AllUsers | Where-Object {$_.SamAccountName -match $user}
... 6000 times, powershell has to enumerate all 18000 objects in $AllUsers and test them for the Where-Object filter.
Instead of using an array and Where-Object, consider loading all users into a hashtable:
# Only need to run this once, before the loop
$AllLogonsTable = #{}
$AllUsers |ForEach-Object {
# Check if the hashtable already contains an item associated with the user name
if(-not $AllLogonsTable.ContainsKey($_.SamAccountName)){
# Before adding the first item, create an array we can add subsequent items to
$AllLogonsTable[$_.SamAccountName] = #()
}
# Add the item to the array associated with the username
$AllUsersTable[$_.SamAccountName] += $_
}
foreach($user in $users){
# This will be _much faster_ than $AllUsers |Where-Object ...
$allLogons = $AllLogonsTable[$user]
}
Hashtables have crazy-fast lookups - finding an object by key is much faster that using Where-Object on an array.

Run command, extract a field, run a resultant command

Apologies if this is an insanely simple question, but I'm at something at a loss.
What I'm trying to do is take a command output - in this case from NetApp DFM:
dfm event list
ID Source Name Severity Timestamp
------- ------- ------------- ----------- ------------
1 332 volume-online Normal 20 Apr 10:16
2 443 volume-online Normal 20 Apr 10:17
3 3222 volume-online Normal 20 Apr 10:18
I have about 17,000 events - I want to delete them all by ID, by running:
dfm event delete <ID>
I know exactly how I'd do this on Unix (and used to, when this was our platform):
for i in `dfm event list | awk '{print $1}'`
do
dfm event delete $i
done
For bonus points - a 'grep' type criteria? I apologise in advance for the basic nature of the question - I've tried looking on Google for a suitable example, but haven't found anything.
I've made a start by:
dfm event list > dfmevent.txt
foreach ( $line in get-content dfmevent.txt ) {
echo $line
}
But I thought I would ask if there's a better way.
I don't have access to your environment to test but if you are just trying to get access to that first element which is the ID then that should be straight forward.
dfm event list | ForEach-Object{$_.Split(" ",2)[0]} | Where-Object{$_ -match '^\d+$'} | ForEach-Object{
#For Testing
Write-Host "Id: $_ will be deleted"
# Then do something
# dfm event delete $_
}
I'm sure the output is already delimited with new line so sending to file might be redundant.
We take each line and try and split it on the first space. Then pass the first element from that array. Next we ensure that element is indeed a number with a simple regex check. This will ensure that we only get numbers. I had thought about skipping the first two lines but this should work for other occurrences of text as well.
The last loop is for processing that ID. I left a Write-Host there for testing. Assuming you get the id's you are looking for you should just be able to uncomment out that last line with dfm event delete $_
Capturing the output of a DOS command into Powershell is a challenge.
Using a native snapin or module from NetApp would be easier.
might be worth checking out if that link helps
Otherwise, your method of writing to a text file and reading it back in is actually quite a good idea, this is one way of reading it back and pushing the data into the command you need.
$a = get-content dfmevent.txt
foreach ($i in $a) { if ($i.ReadCount -gt 2) { dfm event delete ($i.Substring(0,$i.IndexOf(" "))) } }
This will assign to the variable $result only
$a = get-content dfmevent.txt
$result = #()
foreach ($i in $a) { if ($i.ReadCount -gt 2) { $result += $i.Substring(0,$i.IndexOf(" "))} }
And if you did not want to write to a text file, you could use the .NET method of capturing the output directly
$ProcessInfo = New-Object System.Diagnostics.ProcessStartInfo
$ProcessInfo.FileName = "dfm"
$ProcessInfo.RedirectStandardOutput = $true
$ProcessInfo.UseShellExecute = $false
$ProcessInfo.Arguments = "event list"
$Process = New-Object System.Diagnostics.Process
$Process.StartInfo = $ProcessInfo
$Process.Start() | Out-Null
$Process.WaitForExit()
$output = $Process.StandardOutput.ReadToEnd()