Sort an array containing a lot of dates quickly - powershell

I have a huge array which contains dates. The date has the following form: tt.mm.yyyy. I know how to sort the array with Sort-Object, but the sorting takes a lot of time. I found another way of sorting arrays, but it doesn't work as expected.
My former code to sort the array was like this.
$data | Sort-Object { [System.DateTime]::ParseExact($_, "dd.MM.yyyy", $null) }
But as I siad before: this way of sorting is too slow. The Sort() method from System.Array seems to be much faster.
[Array]::Sort([array]$array)
This code sorts an array containing strings much faster than Sort-Object. Is there a way how I can change the above sorting method like the Sort-Object method?

The .NET method will work for dates if you make sure that the array is of type DateTime.
Meaning you should use
[DateTime[]]$dateArray
instead of
[Array]$dateArray
when you create it. Then you can use
[Array]::Sort($dateArray)
to perform the sort it self...

Your input data are date strings with a date format that doesn't allow sorting in "date" order. You must convert the strings either to actual dates
Get-Date $_
[DateTime]::ParseExact($_, "dd.MM.yyyy", $null)
or change the format of the string dates to ISO format, which does allow sorting in date order.
'{2}-{1}-{0}' -f ($_ -split '.')
'{0}-{1}-{2}' -f $_.Substring(6,4), $_.Substring(3,2), $_.Substring(0,2)
$_ -replace '(\d+)\.(\d+).(\d+)', '$3-$2-$1'
At some point you must do one of these conversions, either when creating the data or when sorting.
I ran some tests WRT performance of each conversion, and string transformation using the Substring() method seems to be the fastest way:
PS C:\> $dates = 1..10000 | % {
>> $day = Get-Random -Min 1 -Max 28
>> $month = (Get-Random -Min 1 -Max 12
>> $year = Get-Random -Min 1900 -Max 2014
>> '{0:d2}.{1:d2}.{2}' -f $day, $month, $year
>> }
>>
PS C:\> Measure-Command { $dates | sort {Get-Date $_} }
Days : 0
Hours : 0
Minutes : 0
Seconds : 1
Milliseconds : 520
Ticks : 15200396
TotalDays : 1,75930509259259E-05
TotalHours : 0,000422233222222222
TotalMinutes : 0,0253339933333333
TotalSeconds : 1,5200396
TotalMilliseconds : 1520,0396
PS C:\> Measure-Command { $dates | sort {'{2}-{1}-{0}' -f ($_ -split '.')} }
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 413
Ticks : 4139027
TotalDays : 4,79054050925926E-06
TotalHours : 0,000114972972222222
TotalMinutes : 0,00689837833333333
TotalSeconds : 0,4139027
TotalMilliseconds : 413,9027
PS C:\> Measure-Command { $dates | sort {$_ -replace '(\d+)\.(\d+).(\d+)', '$3-$2-$1'} }
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 348
Ticks : 3488962
TotalDays : 4,03815046296296E-06
TotalHours : 9,69156111111111E-05
TotalMinutes : 0,00581493666666667
TotalSeconds : 0,3488962
TotalMilliseconds : 348,8962
PS C:\> Measure-Command { $dates | sort {[DateTime]::ParseExact($_, "dd.MM.yyyy", $null)} }
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 340
Ticks : 3408966
TotalDays : 3,9455625E-06
TotalHours : 9,46935E-05
TotalMinutes : 0,00568161
TotalSeconds : 0,3408966
TotalMilliseconds : 340,8966
PS C:\> Measure-Command { $dates | sort {'{0}-{1}-{2}' -f $_.Substring(6,4), $_.Substring(3,2), $_.Substring(0,2)} }
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 292
Ticks : 2926835
TotalDays : 3,38754050925926E-06
TotalHours : 8,13009722222222E-05
TotalMinutes : 0,00487805833333333
TotalSeconds : 0,2926835
TotalMilliseconds : 292,6835

Related

Splitting numbers and letters from a string misses zeroes

I am processing a time value that arrives in a string that starts with a T then is followed by a series of numbers immediately followed by the unit for each number. For example, 8 hours and 22 minutes will come in as T8H22M and 3 hours, 2 minutes, and 1 second will be T3H2M1S.
I need to split this encoded value into separate H/M/S columns, but I am having issues getting zeroes to work properly. I am using this code from another Question here with similar (but not identical) requirements.
This script:
Write-Host $('T8H22M' -split { [bool]($_ -as [double]) })
Write-Host $('T8H22M' -split { ! [bool]($_ -as [double]) })
Write-Host $('T8H0M' -split { [bool]($_ -as [double]) })
Write-Host $('T8H0M' -split { ! [bool]($_ -as [double]) })
Write-Host $('T0H' -split { $_ -eq '0' })
Produces this output:
T H M
8 22
T H0M
8
T H
As you can see, any time the numerical value is zero the string just doesn't split. This is a real problem as zero-time comes in as T0H which just won't parse using the method above.
How can I modify the code above to also split out zero values?
You could use a regex to extract the values. For example:
'T3H2M1S','T8H22M','T8H0M','T0H' |
ForEach-Object {
if($_ -match '^T((?<hours>\d+)H)*((?<minutes>\d+)M)*((?<seconds>\d+)S)*$') {
[PsCustomObject]#{
TimeString = $matches.0
Hours = [Int]$matches.hours
Minutes = [Int]$matches.minutes
Seconds = [Int]$matches.seconds
}
}
}
This produces an object for each time string, with Hours, Minutes and Seconds properties corresponding to the related part of that string:
TimeString Hours Minutes Seconds
---------- ----- ------- -------
T3H2M1S 3 2 1
T8H22M 8 22 0
T8H0M 8 0 0
T0H 0 0 0
Of course, you can change the contents of the if to manipulate the values however you like, not just creating an object - the key is that the RegEx should* split it correctly for you.
* - I say 'should' because I only tested with the example strings you give, so be sure to test more thoroughly yourself.
The time values you have resemble the ISO 8601 duration format except they all lack the leading P.
You can use the ToTimeSpan() method of .net System.Xml.XmlConvert to convert these to TimeSpans using:
'T3H2M1S','T8H22M','T8H0M','T0H' | ForEach-Object {
[System.Xml.XmlConvert]::ToTimeSpan("P$_") # prepend a 'P'
}
To return an array of TimeSpan objects
Days : 0
Hours : 8
Minutes : 22
Seconds : 0
Milliseconds : 0
Ticks : 301200000000
TotalDays : 0,348611111111111
TotalHours : 8,36666666666667
TotalMinutes : 502
TotalSeconds : 30120
TotalMilliseconds : 30120000
Days : 0
Hours : 8
Minutes : 0
Seconds : 0
Milliseconds : 0
Ticks : 288000000000
TotalDays : 0,333333333333333
TotalHours : 8
TotalMinutes : 480
TotalSeconds : 28800
TotalMilliseconds : 28800000
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 0
Ticks : 0
TotalDays : 0
TotalHours : 0
TotalMinutes : 0
TotalSeconds : 0
TotalMilliseconds : 0
Now you can use any of the properties to format as you like.
To compliment the answers from #boxdog and #Theo on the actual question why the zero's are missing and "How can I modify the code above to also split out zero values?"
Quote on the <ScriptBlock> parameter
<ScriptBlock>
An expression that specifies rules for applying the delimiter. The
expression must evaluate to $true or $false. Enclose the script block
in braces.
The point here is that the 0 evaluates to $False ("falsy"):
if (0) { 'True' } else { 'False' }
False
if (1) { 'True' } else { 'False' }
True
[Bool]0
False
[Bool]1
True
In other words you will need to compare your split expression against $Null to get what you are looking for:
Write-Host $('T8H22M' -split { $Null -ne ($_ -as [double]) })
T H M
Write-Host $('T8H22M' -split { $Null -eq ($_ -as [double]) })
8 22
Write-Host $('T8H0M' -split { $Null -ne ($_ -as [double]) })
T H M
Write-Host $('T8H0M' -split { $Null -eq ($_ -as [double]) })
8 0
Anyways, I recommend you to go for the solution from #boxdog or #Theo.

powershell any search a large text file faster

$File="C:\temp\test\ID.txt"
$line="ART.023.AGA_203.PL"
Measure-Command {$Sel = Select-String -pattern $line -path $File }
Measure-Command {
$reader = New-Object System.IO.StreamReader($File)
$content = $reader.ReadToEnd().Split('`n')
$results = $content | select-string -Pattern $line
}
Measure-Command {
$content= get-content $File
$results = $content | select-string -Pattern $line
$results
}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 197
Ticks : 1970580
TotalDays : 2.28076388888889E-06
TotalHours : 5.47383333333333E-05
TotalMinutes : 0.0032843
TotalSeconds : 0.197058
TotalMilliseconds : 197.058
Days : 0
Hours : 0
Minutes : 0
Seconds : 4
Milliseconds : 135
Ticks : 41350664
TotalDays : 4.78595648148148E-05
TotalHours : 0.00114862955555556
TotalMinutes : 0.0689177733333333
TotalSeconds : 4.1350664
TotalMilliseconds : 4135.0664
Days : 0
Hours : 0
Minutes : 0
Seconds : 4
Milliseconds : 926
Ticks : 49265692
TotalDays : 5.70204768518518E-05
TotalHours : 0.00136849144444444
TotalMinutes : 0.0821094866666667
TotalSeconds : 4.9265692
TotalMilliseconds : 4926.5692
i want to search about 10000 $line in $File
search time very slower,any faster?
example :
search Keyword:$line ,then $File will show line
Keyword:ART.023.AGA_203.PL
file:2,45433;ART.023.AGA_203.PL;dddd;wwww;tt;
How does this compare to your other methods?
Measure-Command {
Get-Content $file -ReadCount 1000 |
foreach {$_ -match $line}
}
Note: when comparison testing operations that do disk reads like this, always run multiple tests and discard the first one. If the disk has any on-board read cache, the first test can pre-load the cache for subsequent tests and skew the results.

In Powershell what is the most efficient way to generate a range interval?

Here is one example, but there must be a more efficient way:
1..100|%{$temp=$_;$temp%=3;if ($temp -eq 0){$_} }
1..100 | Where-Object {$_ % 3 -eq 0}
I would guess that the "most efficient" way would be to use a plain old for loop:
for($i=3; $i -le 100; $i +=3){$i}
Though that's not very elegant. You could create a function:
function range($start,$end,$interval) {for($i=$start; $i -le $end; $i +=$interval){$i}}
Timing this against your method (using more pithy version of other answer):
# ~> measure-command {1..100 | Where-Object {$_ % 3 -eq 0}}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 7
Ticks : 76020
TotalDays : 8.79861111111111E-08
TotalHours : 2.11166666666667E-06
TotalMinutes : 0.0001267
TotalSeconds : 0.007602
TotalMilliseconds : 7.602
# ~> measure-command{range 3 100 3}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 0
Ticks : 6197
TotalDays : 7.1724537037037E-09
TotalHours : 1.72138888888889E-07
TotalMinutes : 1.03283333333333E-05
TotalSeconds : 0.0006197
TotalMilliseconds : 0.6197

New-TimeSpan cmdlet in powershell

How can i use New-Timespan cmdlet to calculate the total time taken for script execution.I tried a sample like this.
$val=Get-Date
$start=New-TimeSpan -Start $val
$val2=Get-Date
$end=New-TimeSpan -End $val2
$diff=New-TimeSpan -Start $start -End $end
But ended up with following error: New-TimeSpan : Cannot bind parameter 'Start'. Cannot convert the "00:00:08.7110000" value of type "System.TimeSpan" to
type "System.DateTime".
You don't need to use New-TimeSpan just subtract the DateTime objects:
$script_start = Get-Date
Start-Sleep -Seconds 5
$script_end = Get-Date
$script_end - $script_start
This will create a TimeSpan object.
You could use Measure-Command. It returns a timespan object. Example:
PS C:\> Measure-Command -Expression {1..10000000}
Days : 0
Hours : 0
Minutes : 0
Seconds : 1
Milliseconds : 714
Ticks : 17149279
TotalDays : 1.98487025462963E-05
TotalHours : 0.000476368861111111
TotalMinutes : 0.0285821316666667
TotalSeconds : 1.7149279
TotalMilliseconds : 1714.9279

Converting time 121.419419 to readable minutes/seconds

I'd like to calculate the time my script runs, but my result from get-date is in totalseconds.
How can I convert this to 31:14:12 behing hours:minutes:seconds?
PS> $ts = New-TimeSpan -Seconds 1234567
PS> '{0:00}:{1:00}:{2:00}' -f $ts.Hours,$ts.Minutes,$ts.Seconds
06:56:07
or
PS> "$ts" -replace '^\d+?\.'
06:56:07
All you have to do is use the Measure-Command cmdlet to get the time:
PS > measure-command { sleep 5}
Days : 0
Hours : 0
Minutes : 0
Seconds : 5
Milliseconds : 13
Ticks : 50137481
TotalDays : 5.80294918981481E-05
TotalHours : 0.00139270780555556
TotalMinutes : 0.0835624683333333
TotalSeconds : 5.0137481
TotalMilliseconds : 5013.7481
The above output itself might be good enough for you, or you can format it appropriately as the the output of Measure-Command is a TimeSpan object. Or you can use ToString:
PS > (measure-command { sleep 125}).tostring()
00:02:05.0017446