Related
I am working on a CSV file which have start and end date and the requirement is group records by dates when the dates overlap each other.
For example, in below table Bill_Number 177835 Start_Date and End_Date is overlapping with 178682,179504, 178990 Start_Date and End_Date so all should be grouped together and so on for each and every record.
Bill_Number,Start_Date,End_Date
177835,4/14/20 3:00 AM,4/14/20 7:00 AM
178682,4/14/20 3:00 AM,4/14/20 7:00 AM
179504,4/14/20 3:29 AM,4/14/20 6:29 AM
178662,4/14/20 4:30 AM,4/14/20 5:30 AM
178990,4/14/20 6:00 AM,4/14/20 10:00 AM
178995,4/15/20 6:00 AM,4/15/20 10:00 AM
178998,4/15/20 6:00 AM,4/15/20 10:00 AM
I have tried different combination like "Group-by" and "for loop" but not able to produce result.
With the above example of CSV, the expected result is;
Group1: 177835,178682,179504, 178990
Group2: 177835,178682,179504, 178662
Group3: 178995, 178998
Currently i have below code in hand.
Any help on this will be appreciated,thanks in advance.
$array = #(‘ab’,’bc’,’cd’,’df’)
for ($y = 0; $y -lt $array.count) {
for ($x = 0; $x -lt $array.count) {
if ($array[$y]-ne $array[$x]){
Write-Host $array[$y],$array[$x]
}
$x++
}
$y++
}
You can do something like the following. There is likely a cleaner solution, but that could take a lot of time.
$csv = Import-Csv file.csv
# Creates all inclusive groups where times overlap
$csvGroups = foreach ($row in $csv) {
$start = [datetime]$row.Start_Date
$end = [datetime]$row.End_Date
,($csv | where { ($start -ge [datetime]$_.Start_Date -and $start -le [datetime]$_.End_Date) -or ($end -ge [datetime]$_.Start_Date -and $end -le [datetime]$_.End_Date) })
}
# Removes duplicates from $csvGroups
$groups = $csvGroups | Group {$_.Bill_number -join ','} |
Foreach-Object { ,$_.Group[0] }
# Compares current group against all groups except itself
$output = for ($i = 0; $i -lt $groups.count; $i++) {
$unique = $true # indicates if the group's bill_numbers are in another group
$group = $groups[$i]
$list = $groups -as [system.collections.arraylist]
$list.RemoveAt($i) # Removes self
foreach ($innerGroup in $list) {
# If current group's bill_numbers are in another group, skip to next group
if ((compare $group.Bill_Number $innergroup.Bill_Number).SideIndicator -notcontains '<=') {
$unique = $false
break
}
}
if ($unique) {
,$group
}
}
$groupCounter = 1
# Output formatting
$output | Foreach-Object { "Group{0}:{1}" -f $groupCounter++,($_.Bill_Number -join ",")}
Explanation:
I added comments to give an idea as to what is going on.
The ,$variable syntax uses the unary operator ,. It converts the output into an array. Typically, PowerShell unrolls an array as individual items. The unrolling becomes a problem here because we want the groups to stays as groups (arrays). Otherwise, there would be a lot of duplicate bill numbers, and we'd lose track between groups.
An arraylist is used for $list. This is so we can access the RemoveAt() method. A typical array is of fixed size and can't be manipulated in that fashion. This can effectively be done with an array, but the code is different. You either have to select the index ranges around the item you want to skip or create a new array using some other conditional statement that will exclude the target item. An arraylist is just easier for me (personal preference).
So a very dirty approach. I think there are a coup of ways to determine if there's overlap for a specific comparison, one record to another. However you may need a list of bill numbers each bill date range collides with. using a function call in a Select-Object statement/expression I added a collisions property to your objects.
The function is wordy and probably be improved, but the gist is that for each record it will compare to all other records and report that bill number in it's collision property if either the start or end date falls within the other records range.
This is of course just demo code, I'm sure it can be made better for your purposes, but may be a starting point for you.
Obviously change the path to the CSV file.
Function Get-Collisions
{
Param(
[Parameter(Mandatory = $true)]
[Object]$ReferenceObject,
[Parameter( Mandatory = $true )]
[Object[]]$CompareObjects
) # End Parameter Block
ForEach($Object in $CompareObjects)
{
If( !($ReferenceObject.Bill_Number -eq $Object.Bill_Number) )
{
If(
( $ReferenceObject.Start_Date -ge $Objact.StartDate -and $ReferenceObject.Start_Date -le $Objact.End_Date ) -or
( $ReferenceObject.End_Date -ge $Object.Start_Date -and $ReferenceObject.End_Date -le $Object.End_Date ) -or
( $ReferenceObject.Start_Date -le $Object.Start_Date -and $ReferenceObject.End_Date -ge $Object.Start_Date )
)
{
$Object.Bill_Number
}
}
}
} # End Get-Collisions
$Objects = Import-Csv 'C:\temp\DateOverlap.CSV'
$Objects |
ForEach-Object{
$_.Start_Date = [DateTime]$_.Start_Date
$_.End_Date = [DateTime]$_.End_Date
}
$Objects = $Objects |
Select-object *,#{Name = 'Collisions'; Expression = { Get-Collisions -ReferenceObject $_ -CompareObjects $Objects }}
$Objects | Format-Table -AutoSize
Let me know how it goes. Thanks.
#Shan , I saw your comments so I wanted to respond with some additional code and discussion. I may have gone overboard, but you expressed a desire to learn, such that you can maintain these code pieces in the future. So, I put a lot of time into this.
I may mention some of #AdminOfThings work too. That is not criticism, but collaboration. His example is clever and dynamic in terms of getting the job done and pulling in the right tools as he worked his way to the desired output.
I originally side-stepped the grouping question because I didn't feel like naming/numbering the groups had any meaning. For example: "Group 1" indicates all its members have overlap in their billing periods, but no indication of what or when the overlap is. Maybe I rushed through it… I may have been reading too much into it or perhaps even letting my own biases get in the way. At any rate, I elected to create a relationship from the perspective of each bill number, and that resulted in my first answer.
Since then, and because of your comment, I put effort into extending and documenting the first example I gave. The revised code will be Example 1 below. I've heavily commented it and most of the comments will apply to the original example as well. There are some differences that were forced by the extended grouping functionality, but the comments should reflect those situations.
Note: You'll also see I stopped calling them "collisions" and termed them "overlaps" instead.
Example 1:
Function Get-Overlaps
{
<#
.SYNOPSIS
Given an object (reference object) compare to a collection of other objects of the same
type. Return an array of billing numbers for which the billing period overlaps that of
the reference object.
.DESCRIPTION
Given an object (reference object) compare to a collection of other objects of the same
type. Return an array of billing numbers for which the billing period overlaps that of
the reference object.
.PARAMETER ReferenceObject
This is the current object you wish to compare to all other objects.
.PARAMETER
The collection of objects you want to compare with the reference object.
.NOTES
> The date time casting could probably have been done by further preparing
the objects in the calling code. However, givin this is for a
StackOverflow question I can polish that later.
#>
Param(
[Parameter(Mandatory = $true)]
[Object]$ReferenceObject,
[Parameter( Mandatory = $true )]
[Object[]]$CompareObjects
) # End Parameter Block
[Collections.ArrayList]$Return = #()
$R_StartDate = [DateTime]$ReferenceObject.Start_Date
$R_EndDate = [DateTime]$ReferenceObject.End_Date
ForEach($Object in $CompareObjects)
{
$O_StartDate = [DateTime]$Object.Start_Date
$O_EndDate = [DateTime]$Object.End_Date
# The first if statement skips the reference object's bill_number
If( !($ReferenceObject.Bill_Number -eq $Object.Bill_Number) )
{
# This logic can use some explaining. So far as I could tell there were 2 cases to look for:
# 1) Either or both the start and end dates fell inside the the timespan of the comparison
# object. This cases is handle by the first 2 conditions.
# 2) If the reference objects timespan covers the entire timespan of the comparison object.
# Meaning the start date is before and the end date is after, fitting the entire
# comparison timespan is within the bounds of the reference timespan. I elected to use
# the 3rd condition below to detect that case because once the start date is earlier I
# only have to care if the end date is greater than the start date. It's a little more
# inclusive and partially covered by the previous conditions, but whatever, you gotta
# pick something...
#
# Note: This was a deceptively difficult thing to comprehend, I missed that last condition
# in my first example (later corrected) and I think #AdminOfThings also overlooked it.
If(
( $R_StartDate -ge $O_StartDate -and $R_StartDate -le $O_EndDate ) -or
( $R_EndDate -ge $O_StartDate -and $R_EndDate -le $O_EndDate ) -or
( $R_StartDate -le $O_StartDate -and $R_EndDate -ge $O_StartDate )
)
{
[Void]$Return.Add( $Object.Bill_Number )
}
}
}
Return $Return
} # End Get-Overlaps
$Objects =
Import-Csv 'C:\temp\DateOverlap.CSV' |
ForEach-Object{
# Consider overlap as a relationship from the perspective of a given Object.
$Overlaps = [Collections.ArrayList]#(Get-overlaps -ReferenceObject $_ -CompareObjects $Objects)
# Knowing the overlaps I can infer the group, by adding the group's bill_number to its group property.
If( $Overlaps )
{ # Don't calculate a group unless you actually have overlaps:
$Group = $Overlaps.Clone()
[Void]$Group.Add( $_.Bill_Number ) # Can you do in the above line, but for readability I separated it.
}
Else { $Group = $null } # Ensure's not reusing group from a previous iteration of the loop.
# Create a new PSCustomObject with the data so far.
[PSCustomObject][Ordered]#{
Bill_Number = $_.Bill_Number
Start_Date = [DateTime]$_.Start_Date
End_Date = [DateTime]$_.End_Date
Overlaps = $Overlaps
Group = $Group | Sort-Object # Sorting will make it a lot easier to get unique lists later.
}
}
# The reason I recreated the objects from the CSV file instead of using Select-Object as I had
# previously is that I simply couldn't get Select-Object to maintain type ArrayList that was being
# returned from the function. I know that's a documented problem or circumstance some where.
# Now I'll add one more property called Group_ID a comma delimited string that we can later use
# to echo the groups according to your original request.
$Objects =
$Objects |
Select-Object *,#{Name = 'Group_ID'; Expression = { $_.Group -join ', ' } }
# This output is just for the sake of showing the new objects:
$Objects | Format-Table -AutoSize -Wrap
# Now create an array of unique Group_ID strings, this is possible of the sorts and joins done earlier.
$UniqueGroups = $Objects.Group_ID | Select-Object -Unique
$Num = 1
ForEach($UniqueGroup in $UniqueGroups)
{
"Group $Num : $UniqueGroup"
++$Num # Increment the $Num, using convienient unary operator, so next group is echoed properly.
}
# Below is a traditional for loop that does the same thing. I did that first before deciding the ForEach
# was cleaner. Leaving it commented below, because you're on a learning-quest, so just more demo code...
# For($i = 0; $i -lt $UniqueGroups.Count; ++$i)
# {
# $Num = $i + 1
# $UniqueGroup = $UniqueGroups[$i]
# "Group $Num : $UniqueGroup"
# }
Example 2:
$Objects =
Import-Csv 'C:\temp\DateOverlap.CSV' |
Select-Object Bill_Number,
#{ Name = 'Start_Date'; Expression = { [DateTime]$_.Start_Date } },
#{ Name = 'End_Date'; Expression = { [DateTime]$_.End_Date } }
# The above select statement converts the Start_Date & End_Date properties to [DateTime] objects
# While you had asked to pack everything into the nested loops, that would have resulted in
# unnecessary recasting of object types to ensure proper comparison. Often this is a matter of
# preference, but in this case I think it's better. I did have it working well without the
# above select, but the code is more readable / concise with it. So even if you treat the
# Select-Object command as a blackbox the rest of the code should be easier to understand.
#
# Of course, and if you couldn't tell from my samples Select-Object is incredibly useful. I
# recommend taking the time to learn it thoroughly. The MS documentation can be found here:
# https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/select-object?view=powershell-5.1
:Outer ForEach( $ReferenceObject in $Objects )
{
# In other revisions I had assigned these values to some shorter variable names.
# I took that out. Again since you're learning I wanted the all the dot referencing
# to be on full display.
$ReferenceObject.Start_Date = $ReferenceObject.Start_Date
$ReferenceObject.End_Date = $ReferenceObject.End_Date
[Collections.ArrayList]$TempArrList = #() # Reset this on each iteration of the outer loop.
:Inner ForEach( $ComparisonObject in $Objects )
{
If( $ComparisonObject.Bill_Number -eq $ReferenceObject.Bill_Number )
{ # Skip the current reference object in the $Objects collection! This prevents the duplication of
# the current Bill's number within it's group, helping to ensure unique-ification.
#
# By now you should have seen across all revision including AdminOfThings demo, that there was some
# need skip the current item when searching for overlaps. And, that there are a number of ways to
# accomplish that. In this case I simply go back to the top of the loop when the current record
# is encountered, effectively skipping it.
Continue Inner
}
# The below logic needs some explaining. So far as I could tell there were 2 cases to look for:
# 1) Either or both the start and end dates fell inside the the timespan of the comparison
# object. This cases is handle by the first 2 conditions.
# 2) If the reference object's timespan covers the entire timespan of the comparison object.
# Meaning the start date is before and the end date is after, fitting the entire
# comparison timespan is within the bounds of the reference timespan. I elected to use
# the 3rd condition below to detect that case because once the start date is earlier I
# only have to care if the end date is greater than the other start date. It's a little
# more inclusive and partially covered by the previous conditions, but whatever, you gotta
# pick something...
#
# Note: This was a deceptively difficult thing to comprehend, I missed that last condition
# in my first example (later corrected) and I think #AdminOfThings also overlooked it.
If(
( $ReferenceObject.Start_Date -ge $ComparisonObject.Start_Date -and $ReferenceObject.Start_Date -le $ComparisonObject.End_Date ) -or
( $ReferenceObject.End_Date -ge $ComparisonObject.Start_Date -and $ReferenceObject.End_Date -le $ComparisonObject.End_Date ) -or
( $ReferenceObject.Start_Date -le $ComparisonObject.Start_Date -and $ReferenceObject.End_Date -ge $ComparisonObject.Start_Date )
)
{
[Void]$TempArrList.Add( $ComparisonObject.Bill_Number )
}
}
# Now Add the properties!
$ReferenceObject | Add-Member -Name Overlaps -MemberType NoteProperty -Value $TempArrList
If( $ReferenceObject.Overlaps )
{
[Void]$TempArrList.Add($ReferenceObject.Bill_Number)
$ReferenceObject | Add-Member -Name Group -MemberType NoteProperty -Value ( $TempArrList | Sort-Object )
$ReferenceObject | Add-Member -Name Group_ID -MemberType NoteProperty -Value ( $ReferenceObject.Group -join ', ' )
# Below a script property also works, but I think the above is easier to follow:
# $ReferenceObject | Add-Member -Name Group_ID -MemberType ScriptProperty -Value { $this.Group -join ', ' }
}
Else
{
$ReferenceObject | Add-Member -Name Group -MemberType NoteProperty -Value $null
$ReferenceObject | Add-Member -Name Group_ID -MemberType NoteProperty -Value $null
}
}
# This output is just for the sake of showing the new objects:
$Objects | Format-Table -AutoSize -Wrap
# Now create an array of unique Group_ID strings, this is possible of the sorts and joins done earlier.
#
# It's important to point out I chose to sort because I saw the clever solution that AdminOfThings
# used. There's a need to display only groups that have unique memberships, not necessarily unique
# ordering of the members. He identified these by doing some additional loops and using the Compare
# -Object cmdlet. Again, I must say that was very clever, and Compare-Object is another tool very much
# worth getting to know. However, the code didn't seem like it cared which of the various orderings it
# ultimately output. Therefore I could conclude the order wasn't really important, and it's fine if the
# groups are sorted. With the objects sorted it's much easier to derive the truely unique lists with the
# simple Select-Object command below.
$UniqueGroups = $Objects.Group_ID | Select-Object -Unique
# Finally Loop through the UniqueGroups
$Num = 1
ForEach($UniqueGroup in $UniqueGroups)
{
"Group $Num : $UniqueGroup"
++$Num # Increment the $Num, using convienient unary operator, so next group is echoed properly.
}
Additional Discussion:
Hopefully the examples are helpful. I wanted to mentioned a few more points:
Using ArrayLists ( [System.Collections.ArrayList] ) instead of native arrays. The typical reason to do this is the ability to add and remove elements quickly. If you search the internet you'll find hundreds of articles explaining why it's faster. It's so common you'll often find experienced PowerShell users implementing it instinctively. But the main reason is speed and the flexibility to easily add and remove elements.
You'll notice I relied heavily on the ability to append new properties to objects. There are several ways to do this, Select-Object , Creating your own objects, and in Example 2 above I used Get-Member. The main reason I used Get-Member was I couldn't get the ArrayList type to stick when using Select-Object.
Regarding loops. This is specific to your desire for nested loops. My first answer still had loops, except some were implied by the pipe, and others were stored in a helper function. The latter is really also a preference; for readability it's sometimes helpful to park some code out of view from the main code body. That said, all the same concepts were there from the beginning. You should get comfortable with the implied loop that comes with pipe-lining capability.
I don't think there's much more I can say without getting redundant. I really hope this was helpful, it was certainly fun for me to work on it. If you have questions or feedback let me know. Thanks.
Is this possible?
I'm brand new to powershell and am currently in the process of converting a vbscript script to Powershell. The following one-liner command seems to do exactly what the entire vbscript does:
Repadmin /istg
which outputs
Repadmin: running command /istg against full DC ST-DC7.somestuff.com
Gathering topology from site BR-CORP (ST-DC7.somestuff.com):
Site ISTG
================== =================
Portland ST-DC4
Venyu ST-DC5
BR-Office ST-DC3
BR-CORP ST-DC7
The problem is I need to return this info (namely the last 4 lines) as objects which contain a "Site" and "ISTG" field. I tried the following:
$returnValues = Repadmin /istg
$returnValues
But this didin't return anything (possibly because Repadmin writes out the lines instead of actually returning the data?)
Is there a way to get the Info from "Repadmin /istg" into an array?
Here's one possible way, using regular expressions:
$output = repadmin /istg
for ( $n = 10; $n -lt $output.Count; $n++ ) {
if ( $output[$n] -ne "" ) {
$output[$n] | select-string '\s*(\S*)\s*(\S*)$' | foreach-object {
$site = $_.Matches[0].Groups[1].Value
$istg = $_.Matches[0].Groups[2].Value
}
new-object PSObject -property #{
"Site" = $site
"ISTG" = $istg
} | select-object Site,ISTG
}
}
You have to start parsing the 10th item of output and ignore empty lines because repadmin.exe seems to insert superflous line breaks (or at least, PowerShell thinks so).
I'm using Compare-Object in PowerShell to compare two XML files. It adequately displays the differences between the two using <= and =>. My problem is that I want to see the difference in context. Since it's XML, one line, one node, is different, but I don't know where that lives in the overall document. If I could grab say 5 lines before and 3 lines after it, it would give me enough information to understand what it is in context. Any ideas?
You can start from something like this:
$a = gc a.xml
$b = gc b.xml
if ($a.Length -ne $b.Length)
{ "File lenght is different" }
else
{
for ( $i= 0; $i -le $a.Length; $i++)
{
If ( $a[$i] -notmatch $b[$i] )
{
#for more context change the range i.e.: -2..2
-1..1 | % { "Line number {0}: value in file a is {1} - value in file b {2}" -f ($i+$_),$a[$i+$_], $b[$i+$_] }
" "
}
}
}
Compare-Object comes with an IncludeEqual parameter that might give what you are looking for:
[xml]$aa = "<this>
<they>1</they>
<they>2></they>
</this>"
[xml]$bb = "<this>
<they>1</they>
<they>2</they>
</this>"
Compare-Object $aa.this.they $bb.this.they -IncludeEqual
Result
InputObject SideIndicator
----------- -------------
1 ==
2 =>
2> <=
I frequently have to search server log files in a directory that may contain 50 or more files of 200+ MB each. I've written a function in Powershell to do this searching. It finds and extracts all the values for a given query parameter. It works great on an individual large file or a collection of small files but totally bites in the above circumstance, a directory of large files.
The function takes a parameter, which consists of the query parameter to be searched.
In pseudo-code:
Take parameter (e.g. someParam or someParam=([^& ]+))
Create a regex (if one is not supplied)
Collect a directory list of *.log, pipe to Select-String
For each pipeline object, add the matchers to a hash as keys
Increment a match counter
Call GC
At the end of the pipelining:
if (hash has keys)
enumerate the hash keys,
sort and append to string array
set-content the string array to a file
print summary to console
exit
else
print summary to console
exit
Here's a stripped-down version of the file processing.
$wtmatches = #{};
gci -Filter *.log | Select-String -Pattern $searcher |
%{ $wtmatches[$_.Matches[0].Groups[1].Value]++; $items++; [GC]::Collect(); }
I'm just using an old perl trick of de-duplicating found items by making them the keys of a hash. Perhaps, this is an error, but a typical output of the processing is going to be around 30,000 items at most. More normally, found items is in the low thousands range. From what I can see, the number of keys in the hash does not affect processing time, it is the size and number of the files that breaks it. I recently threw in the GC in desperation, it does have some positive effect but it is marginal.
The issue is that with the large collection of large files, the processing sucks the RAM pool dry in about 60 seconds. It doesn't actually use a lot of CPU, interestingly, but there's a lot of volatile storage going on. Once the RAM usage has gone up over 90%, I can just punch out and go watch TV. It could take hours to complete the processing to produce a file with 15,000 or 20,000 unique values.
I would like advice and/or suggestions for increasing the efficiency, even if that means using a different paradigm to accomplish the processing. I went with what I know. I use this tool on almost a daily basis.
Oh, and I'm committed to using Powershell. ;-) This function is part of a complete module I've written for my job, so, suggestions of Python, perl or other useful languages are not useful in this case.
Thanks.
mp
Update:
Using latkin's ProcessFile function, I used the following wrapper for testing. His function is orders of magnitude faster than my original.
function Find-WtQuery {
<#
.Synopsis
Takes a parameter with a capture regex and a wildcard for files list.
.Description
This function is intended to be used on large collections of large files that have
the potential to take an unacceptably long time to process using other methods. It
requires that a regex capture group be passed in as the value to search for.
.Parameter Target
The parameter with capture group to find, e.g. WT.z_custom=([^ &]+).
.Parameter Files
The file wildcard to search, e.g. '*.log'
.Outputs
An object with an array of unique values and a count of total matched lines.
#>
param(
[Parameter(Mandatory = $true)] [string] $target,
[Parameter(Mandatory = $false)] [string] $files
)
begin{
$stime = Get-Date
}
process{
$results = gci -Filter $files | ProcessFile -Pattern $target -Group 1;
}
end{
$etime = Get-Date;
$ptime = $etime - $stime;
Write-Host ("Processing time for {0} files was {1}:{2}:{3}." -f (gci
-Filter $files).Count, $ptime.Hours,$ptime.Minutes,$ptime.Seconds);
return $results;
}
}
The output:
clients:\test\logs\global
{powem} [4] --> Find-WtQuery -target "WT.ets=([^ &]+)" -files "*.log"
Processing time for 53 files was 0:1:35.
Thanks to all for comments and help.
Here's a function that will hopefully speed up and reduce the memory impact of the file processing part. It will return an object with 2 properties: The total count of lines matched, and a sorted array of unique strings from the match group specified. (From your description it sounds like you don't really care about the count per string, just the string values themselves)
function ProcessFile
{
param(
[Parameter(ValueFromPipeline = $true, Mandatory = $true)]
[System.IO.FileInfo] $File,
[Parameter(Mandatory = $true)]
[string] $Pattern,
[Parameter(Mandatory = $true)]
[int] $Group
)
begin
{
$regex = new-object Regex #($pattern, 'Compiled')
$set = new-object 'System.Collections.Generic.SortedDictionary[string, int]'
$totalCount = 0
}
process
{
try
{
$reader = new-object IO.StreamReader $_.FullName
while( ($line = $reader.ReadLine()) -ne $null)
{
$m = $regex.Match($line)
if($m.Success)
{
$set[$m.Groups[$group].Value] = 1
$totalCount++
}
}
}
finally
{
$reader.Close()
}
}
end
{
new-object psobject -prop #{TotalCount = $totalCount; Unique = ([string[]]$set.Keys)}
}
}
You can use it like this:
$results = dir *.log | ProcessFile -Pattern 'stuff (capturegroup)' -Group 1
"Total matches: $($results.TotalCount)"
$results.Unique | Out-File .\Results.txt
IMO #latkin's approach is the way to go if you want do this within PowerShell and not use some dedicated tool. I made a few changes though to make the command play better with respect to accepting pipeline input. I also modified the regex to search for all matches on a particular line. Neither approach searches across multiple lines although that scenario would be pretty easy to handle as long as the pattern only ever spanned a few lines. Here's my take on the command (put it in a file called Search-File.ps1):
[CmdletBinding(DefaultParameterSetName="Path")]
param(
[Parameter(Mandatory=$true, Position=0)]
[ValidateNotNullOrEmpty()]
[string]
$Pattern,
[Parameter(Mandatory=$true, Position=1, ParameterSetName="Path",
ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true,
HelpMessage="Path to ...")]
[ValidateNotNullOrEmpty()]
[string[]]
$Path,
[Alias("PSPath")]
[Parameter(Mandatory=$true, Position=1, ParameterSetName="LiteralPath",
ValueFromPipelineByPropertyName=$true,
HelpMessage="Path to ...")]
[ValidateNotNullOrEmpty()]
[string[]]
$LiteralPath,
[Parameter()]
[ValidateRange(0, [int]::MaxValue)]
[int]
$Group = 0
)
Begin
{
Set-StrictMode -Version latest
$count = 0
$matched = #{}
$regex = New-Object System.Text.RegularExpressions.Regex $Pattern,'Compiled'
}
Process
{
if ($psCmdlet.ParameterSetName -eq "Path")
{
# In the -Path (non-literal) case we may need to resolve a wildcarded path
$resolvedPaths = #($Path | Resolve-Path | Convert-Path)
}
else
{
# Must be -LiteralPath
$resolvedPaths = #($LiteralPath | Convert-Path)
}
foreach ($rpath in $resolvedPaths)
{
Write-Verbose "Processing $rpath"
$stream = new-object System.IO.FileStream $rpath,'Open','Read','Read',4096
$reader = new-object System.IO.StreamReader $stream
try
{
while (($line = $reader.ReadLine())-ne $null)
{
$matchColl = $regex.Matches($line)
foreach ($match in $matchColl)
{
$count++
$key = $match.Groups[$Group].Value
if ($matched.ContainsKey($key))
{
$matched[$key]++
}
else
{
$matched[$key] = 1;
}
}
}
}
finally
{
$reader.Close()
}
}
}
End
{
new-object psobject -Property #{TotalCount = $count; Matched = $matched}
}
I ran this against my IIS log dir (8.5 GB and ~1000 files) to find all the IP addresses in all the logs e.g.:
$r = ls . -r *.log | C:\Users\hillr\Search-File.ps1 '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
This took 27 minutes on my system and found 54356330 matches:
$r.Matched.GetEnumerator() | sort Value -Descending | select -f 20
Name Value
---- -----
xxx.140.113.47 22459654
xxx.29.24.217 13430575
xxx.29.24.216 13321196
xxx.140.113.98 4701131
xxx.40.30.254 53724
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
A while back I was reading about multi-variable assignments in PowerShell. This lets you do things like this
64 > $a,$b,$c,$d = "A four word string".split()
65 > $a
A
66 > $b
four
Or you can swap variables in a single statement
$a,$b = $b,$a
What little known nuggets of PowerShell have you come across that you think may not be as well known as they should be?
The $$ command. I often have to do repeated operations on the same file path. For instance check out a file and then open it up in VIM. The $$ feature makes this trivial
PS> tf edit some\really\long\file\path.cpp
PS> gvim $$
It's short and simple but it saves a lot of time.
By far the most powerful feature of PowerShell is its ScriptBlock support. The fact that you can so concisely pass around what are effectively anonymous methods without any type constraints are about as powerful as C++ function pointers and as easy as C# or F# lambdas.
I mean how cool is it that using ScriptBlocks you can implement a "using" statement (which PowerShell doesn't have inherently). Or, pre-v2 you could even implement try-catch-finally.
function Using([Object]$Resource,[ScriptBlock]$Script) {
try {
&$Script
}
finally {
if ($Resource -is [IDisposable]) { $Resource.Dispose() }
}
}
Using ($File = [IO.File]::CreateText("$PWD\blah.txt")) {
$File.WriteLine(...)
}
How cool is that!
A feature that I find is often overlooked is the ability to pass a file to a switch statement.
Switch will iterate through the lines and match against strings (or regular expressions with the -regex parameter), content of variables, numbers, or the line can be passed into an expression to be evaluated as $true or $false
switch -file 'C:\test.txt'
{
'sometext' {Do-Something}
$pwd {Do-SomethingElse}
42 {Write-Host "That's the answer."}
{Test-Path $_} {Do-AThirdThing}
default {'Nothing else matched'}
}
$OFS - output field separator. A handy way to specify how array elements are separated when rendered to a string:
PS> $OFS = ', '
PS> "$(1..5)"
1, 2, 3, 4, 5
PS> $OFS = ';'
PS> "$(1..5)"
1;2;3;4;5
PS> $OFS = $null # set back to default
PS> "$(1..5)"
1 2 3 4 5
Always guaranteeing you get an array result. Consider this code:
PS> $files = dir *.iMayNotExist
PS> $files.length
$files in this case may be $null, a scalar value or an array of values. $files.length isn't going to give you the number of files found for $null or for a single file. In the single file case, you will get the file's size!! Whenever I'm not sure how much data I'll get back I always enclose the command in an array subexpression like so:
PS> $files = #(dir *.iMayNotExist)
PS> $files.length # always returns number of files in array
Then $files will always be an array. It may be empty or have only a single element in it but it will be an array. This makes reasoning with the result much simpler.
Array covariance support:
PS> $arr = '127.0.0.1','192.168.1.100','192.168.1.101'
PS> $ips = [system.net.ipaddress[]]$arr
PS> $ips | ft IPAddressToString, AddressFamily -auto
IPAddressToString AddressFamily
----------------- -------------
127.0.0.1 InterNetwork
192.168.1.100 InterNetwork
192.168.1.101 InterNetwork
Comparing arrays using Compare-Object:
PS> $preamble = [System.Text.Encoding]::UTF8.GetPreamble()
PS> $preamble | foreach {"0x{0:X2}" -f $_}
0xEF
0xBB
0xBF
PS> $fileHeader = Get-Content Utf8File.txt -Enc byte -Total 3
PS> $fileheader | foreach {"0x{0:X2}" -f $_}
0xEF
0xBB
0xBF
PS> #(Compare-Object $preamble $fileHeader -sync 0).Length -eq 0
True
Fore more stuff like this, check out my free eBook - Effective PowerShell.
Along the lines of multi-variable assignments.
$list = 1,2,3,4
While($list) {
$head, $list = $list
$head
}
1
2
3
4
I've been using this:
if (!$?) { # if previous command was not successful
Do some stuff
}
and I also use $_ (current pipeline object) quite a bit, but these might be more known than other stuff.
The fact that many operators work on arrays as well and return the elements where a comparison is true or operate on each element of the array independently:
1..1000 -lt 800 -gt 400 -like "?[5-9]0" -replace 0 -as "int[]" -as "char[]" -notmatch "\d"
This is faster than Where-Object.
Not a language feature but super helpful
f8 -- Takes the text you have put in already and searches for a command that starts with that text.
Tab-search through your history with #
Example:
PS> Get-Process explorer
PS> "Ford Explorer"
PS> "Magellan" | Add-Content "great explorers.txt"
PS> type "great explorers.txt"
PS> #expl <-- Hit the <tab> key to cycle through history entries that have the term "expl"
Love this thread. I could list a ton of things after reading Windows Powershell in Action. There's a disconnect between that book and the documentation. I actually tried to list them all somewhere else here, but got put on hold for "not being a question".
I'll start with foreach with three script blocks (begin/process/end):
Get-ChildItem | ForEach-Object {$sum=0} {$sum++} {$sum}
Speaking of swapping two variables, here's swapping two files:
${c:file1.txt},${c:file2.txt} = ${c:file2.txt},${c:file1.txt}
Search and replace a file:
${c:file.txt} = ${c:file.txt} -replace 'oldstring','newstring'
Using assembly and using namespace statements:
using assembly System.Windows.Forms
using namespace System.Windows.Forms
[messagebox]::show('hello world')
A shorter version of foreach, with properties and methods
ps | foreach name
'hi.there' | Foreach Split .
Use $() operator outside of strings to combine two statements:
$( echo hi; echo there ) | measure
Get-content/Set-content with variables:
$a = ''
get-content variable:a | set-content -value there
Anonymous functions:
1..5 | & {process{$_ * 2}}
Give the anonymous function a name:
$function:timestwo = {process{$_ * 2}}
Anonymous function with parameters:
& {param($x,$y) $x+$y} 2 5
You can stream from foreach () with these, where normally you can't:
& { foreach ($i in 1..10) {$i; sleep 1} } | out-gridview
Run processes in background like unix '&', and then wait for them:
$a = start-process -NoNewWindow powershell {timeout 10; 'done a'} -PassThru
$b = start-process -NoNewWindow powershell {timeout 10; 'done b'} -PassThru
$c = start-process -NoNewWindow powershell {timeout 10; 'done c'} -PassThru
$a,$b,$c | wait-process
Or foreach -parallel in workflows:
workflow work {
foreach -parallel ($i in 1..3) {
sleep 5
"$i done"
}
}
work
Or a workflow parallel block where you can run different things:
function sleepfor($time) { sleep $time; "sleepfor $time done"}
workflow work {
parallel {
sleepfor 3
sleepfor 2
sleepfor 1
}
'hi'
}
work
Three parallel commands in three more runspaces with the api:
$a = [PowerShell]::Create().AddScript{sleep 5;'a done'}
$b = [PowerShell]::Create().AddScript{sleep 5;'b done'}
$c = [PowerShell]::Create().AddScript{sleep 5;'c done'}
$r1,$r2,$r3 = ($a,$b,$c).begininvoke()
$a.EndInvoke($r1); $b.EndInvoke($r2); $c.EndInvoke($r3) # wait
($a,$b,$c).Streams.Error # check for errors
($a,$b,$c).dispose() # cleanup
Parallel processes with invoke-command, but you have to be at an elevated prompt with remote powershell working:
invoke-command localhost,localhost,localhost { sleep 5; 'hi' }
An assignment is an expression:
if ($a = 1) { $a }
$a = $b = 2
Get last array element with -1:
(1,2,3)[-1]
Discard output with [void]:
[void] (echo discard me)
Switch on arrays and $_ on either side:
switch(1,2,3,4,5,6) {
{$_ % 2} {"Odd $_"; continue}
4 {'FOUR'}
default {"Even $_"}
}
Get and set variables in a module:
'$script:count = 0
$script:increment = 1
function Get-Count { return $script:count += $increment }' > counter.psm1 # creating file
import-module .\counter.psm1
$m = get-module counter
& $m Get-Variable count
& $m Set-Variable count 33
See module function definition:
& $m Get-Item function:Get-Count | foreach definition
Run a command with a commandinfo object and the call operator:
$d = get-command get-date
& $d
Dynamic modules:
$m = New-Module {
function foo {"In foo x is $x"}
$x=2
Export-ModuleMember -func foo -var x
}
flags enum:
[flags()] enum bits {one = 1; two = 2; three = 4; four = 8; five = 16}
[bits]31
Little known codes for the -replace operator:
$number Substitutes the last submatch matched by group number.
${name} Substitutes the last submatch matched by a named capture of the form (?).
$$ Substitutes a single "$" literal.
$& Substitutes a copy of the entire match itself.
$` Substitutes all the text from the argument string before the matching portion.
$' Substitutes all the text of the argument string after the matching portion.
$+ Substitutes the last submatch captured.
$_ Substitutes the entire argument string.
Demo of workflows surviving interruptions using checkpoints. Kill the window or reboot. Then start PS again. Use get-job and resume-job to resume the job.
workflow test1 {
foreach ($b in 1..1000) {
$b
Checkpoint-Workflow
}
}
test1 -AsJob -JobName bootjob
Emacs edit mode. Pressing tab completion lists all the options at once. Very useful.
Set-PSReadLineOption -EditMode Emacs
Any command that begins with "get-", you can leave off the "get-":
date
help
End parsing --% and end of parameters -- operators.
write-output --% -inputobject
write-output -- -inputobject
Tab completion on wildcards:
cd \pro*iles # press tab
Compile and import a C# module with a cmdlet inside, even in Osx:
Add-Type -Path ExampleModule.cs -OutputAssembly ExampleModule.dll
Import-Module ./ExampleModule.dll
Iterate backwards over a sequence just use the len of the sequence with a 1 on the other side of the range:
foreach( x in seq.length..1) { Do-Something seq[x] }