Understanding performance impact of "write-output"

Understanding performance impact of "write-output" - powershell

I'm writing a Powershell script (PS version 4) to parse and process IIS log files, and I've come across an issue I don't quite understand: write-output seems to add significant processing time to the script. The core of it is this (there is more, but this demonstrates the issue):
$file = $args[0]
$linecount = 0
$start = [DateTime]::Now
$reader = [IO.File]::OpenText($file)
while ($reader.Peek() -ge 0) {
$line = $reader.ReadLine()
$linecount++
if (0 -eq ($linecount % 10000)) {
$batch = [DateTime]::Now
[Console]::Error.WriteLine(" Processed $linecount lines ($file) in $(($batch - $start).TotalMilliseconds)ms")
$start = $batch
}
$parts = $line.split(' ')
$out = "$file,$($parts[0]) $($parts[1]),$($parts[2]),$($parts[3]),$($parts[4]),$($parts[5]),$($parts[6]),$($parts[7])"
## Send the output out - comment in/out the desired output method
## Direct output - roughly 10,000 lines / 880ms
$out
## Via write-output - roughly 10,000 lines / 1500ms
write-output $out
}
$reader.Close()
Invoked as .\script.ps1 {path_to_340,000_line_IIS_log} > bob.txt; progress/performance timings are given on stderr.
The script above has two output lines - the write-output one is reporting 10,000 lines every 1500ms, whereas the line that does not have write-output takes as good as half that, averaging about 880ms per 10,000 lines.
I thought that an object defaulted to write-output if it had no other thing (i.e., I thought that "bob" was equivalent to write-output "bob"), but the times I'm getting argue against this.
What am I missing here?

Just a guess, but:
Looking at the help on write-output
Write-Output [-InputObject] <PSObject[]> [-NoEnumerate] [<CommonParameters>]
You're giving it an list of objects as an argument, so it's having to spend a little time assembling to them into an array internally before it does the write, whereas simply outputting them just streams them to the pipeline immediately. You could pipe them to Write-Object, but that's going to add another pipeline which might be even worse.
Edit
In addition you'll find that it's adding .062ms per operation (1500 -880)/10000. You have to scale that to very large data sets before it becomes noticeable.

Related

Use Get content or Import-CSV to read 1st column in 2nd line in a csv

So I have a csv file which is 25MB.
I only need to get the value stored in 2nd line in first column and use it later in powershell script.
e.g data
File_name,INVNUM,ID,XXX....850 columns
ABCD,123,090,xxxx.....850 columns
ABCD,120,091,xxxx.....850 columns
xxxxxx5000+ rows
So my first column data is always the same and i just need to get this filename form the first column, 2nd row.
Should I try to use Get-content or Import-csv for this use case ?
Thanks,
Mickey

TessellatingHeckler's helpful answer contains a pragmatic, easy-to-understand solution that is most likely fast enough in practice; the same goes for Robert Cotterman's helpful answer which is concise (and also faster).
If performance is really paramount, you can try the following, which uses the .NET framework directly to read the lines - but given that you only need to read 2 lines, it's probably not worth it:
$inputFile = "$PWD/some.csv" # be sure to specify a *full* path
$isFirstLine=$true
$fname = foreach ($line in [IO.File]::ReadLines($inputFile)) {
if ($isFirstLine) { $isFirstLine = $false; continue } # skip header line
$line -replace '^([^,]*),.*', '$1' # extract 1st field from 2nd line and exit
break # exit
}
Note: A conceptually simpler way to extract the 1st field is to use ($line -split ',')[0], but with a large number of columns the above -replace-based approach is measurably faster.
Update: TessellatingHeckler offers 2 ways to speed up the above:
Use of $line.Substring(0, $line.IndexOf(',')) in lieu of $line -replace '^([^,]*),.*', '$1' in order to avoid relatively costly regex processing.
To lesser gain, use of a [System.IO.StreamReader] instance's .ReadLine() method twice in a row rather than [IO.File]::ReadLines() in a loop.
Here's a performance comparison of the approaches across all answers on this page (as of this writing).
To run it yourself, you must download functions New-CsvSampleData and Time-Command first.
For more representative results, the timings are averaged across 1,000 runs:
# Create sample CSV file 'test.csv' with 850 columns and 100 rows.
$testFileName = "test-$PID.csv"
New-CsvSampleData -Columns 850 -Count 100 | Set-Content $testFileName
# Compare the execution speed of the various approaches:
Time-Command -Count 1000 {
# Import-Csv
Import-Csv -LiteralPath $testFileName |
Select-Object -Skip 1 -First 1 -ExpandProperty 'col1'
}, {
# ReadLines(), -replace
$inputFile = $PWD.ProviderPath + "/$testFileName"
$isFirstLine=$true
foreach ($line in [IO.File]::ReadLines($inputFile)) {
if ($isFirstLine) { $isFirstLine = $false; continue } # skip header line
$line -replace '^([^,]*),.*', '$1' # extract 1st field from 2nd line and exit
break # exit
}
}, {
# ReadLines(), .Substring / IndexOf
$inputFile = $PWD.ProviderPath + "/$testFileName"
$isFirstLine=$true
foreach ($line in [IO.File]::ReadLines($inputFile)) {
if ($isFirstLine) { $isFirstLine = $false; continue } # skip header line
$line.Substring(0, $line.IndexOf(',')) # extract 1st field from 2nd line and exit
break # exit
}
}, {
# ReadLine() x 2, .Substring / IndexOf
$inputFile = $PWD.ProviderPath + "/$testFileName"
$f = [System.IO.StreamReader]::new($inputFile,$true);
$null = $f.ReadLine(); $line = $f.ReadLine()
$line.Substring(0, $line.IndexOf(','))
$f.Close()
}, {
# Get-Content -Head, .Split()
((Get-Content $testFileName -Head 2)[1]).split(',')[1]
} |
Format-Table Factor, Timespan, Command
Remove-Item $testFileName
Sample output from a single-core Windows 10 VM running Windows PowerShell v5.1 / PowerShell Core 6.1.0-preview.4 on a recent-model MacBook Pro:
Windows PowerShell v5.1:
Factor TimeSpan Command
------ -------- -------
1.00 00:00:00.0001922 # ReadLine() x 2, .Substring / IndexOf...
1.04 00:00:00.0002004 # ReadLines(), .Substring / IndexOf...
1.57 00:00:00.0003024 # ReadLines(), -replace...
3.25 00:00:00.0006245 # Get-Content -Head, .Split()...
25.83 00:00:00.0049661 # Import-Csv...
PowerShell Core 6.1.0-preview.4:
Factor TimeSpan Command
------ -------- -------
1.00 00:00:00.0001858 # ReadLine() x 2, .Substring / IndexOf...
1.03 00:00:00.0001911 # ReadLines(), .Substring / IndexOf...
1.60 00:00:00.0002977 # ReadLines(), -replace...
3.30 00:00:00.0006132 # Get-Content -Head, .Split()...
27.54 00:00:00.0051174 # Import-Csv...
Conclusions:
Calling .ReadLine() twice is marginally faster than the ::ReadLines() loop.
Using -replace instead of Substring() / IndexOf() adds about 60% execution time.
Using Get-Content is more than 3 times slower.
Using Import-Csv | Select-Object is close to 30 times(!) slower, presumably due to the large number of columns; that said, in absolute terms we're still only talking about around 5 milliseconds.
As a side note: execution on macOS seems to be noticeably slower overall, with the regex solution and the cmdlet calls also being slower in relative terms.

Depends what you want to prioritize.
$data = Import-Csv -LiteralPath 'c:\temp\data.csv' |
Select-Object -Skip 1 -First 1 -ExpandProperty 'File_Name'
Is short and convenient. (2nd line meaning 2nd line of the file, or 2nd line of the data? Don't skip any if it's the first line of data).
Select-Object with something like -First 1 will break the whole pipeline when it's done, so it won't wait to read the rest of the 25MB in the background before returning.
You could likely speed it up, or reduce memory use, a miniscule amount if you opened the file, seek'd two newlines, then a comma, then read to another comma, or some other long detailed code, but I very much doubt it would be worth it.
Same with Get-Content, the way it adds NoteProperties to the output strings will mean it's likely no easier on memory and not usefully faster than Import-Csv

You could really shorten it with
(gc c:\file.txt -head 2)[1]
Only reads 2 lines and then grabs index 1 (second line)
You could then split it. And grab index 1 of the split up line
((gc c:\file.txt -head 2)[1]).split(',')[1]
UPDATE:::After seeing the new post with times, I was inspired to do some tests myself (Thanks mklement0). this was the fastest I could get to work
$check = 0
foreach ($i in [IO.FILE]::ReadLines("$filePath")){
if ($check -eq 2){break}
if ($check -eq 1){$value = $i.split(',')[1]} #$value = your answer
$check++
}
Just thought of this: remove if -eq 2 and put break after a semi colon after the check 1 is performed. 5 ticks faster. Haven't tested.
here were my results over 40000 tests:
GC split avg was 1.11307622 Milliseconds
GC split Min was 0.3076 Milliseconds
GC split Max was 18.1514 Milliseconds
ReadLines split avg was 0.3836625825 Milliseconds
ReadLines split Min was 0.2309 Milliseconds
ReadLines split Max was 31.7407 Milliseconds
Stream Reader avg was 0.4464924825 Milliseconds
Stream Reader MIN was 0.2703 Milliseconds
Stream Reader Max was 31.4991 Milliseconds
Import-CSV avg was 1.32440485 Milliseconds
Import-CSV MIN was 0.2875 Milliseconds
Import-CSV Max was 103.1694 Milliseconds
I was able to run 3000 tests a second on the 2nd and 3rd, and 1000 tests a second on the first and last. Stream Reader was HIS fastest one. And import CSV wasn't bad, i wonder if the mklement0 didn't have a column named "file_name" in his test csv? Anyhow, I'd personally use the GC command because it's concise and easy to remember. But this is up to you, and I wish you luck on your scripting adventures.
I'm certain we could start hyperthreading this and get insane results, but when you're talking thousandths of a second is it really a big deal? Especially to get one variable? :D
here's the streamreader code I used for transparency reasons...
$inputFile = "$filePath"
$f = [System.IO.StreamReader]::new($inputFile,$true);
$null = $f.ReadLine(); $line = $f.ReadLine()
$line.Substring(0, $line.IndexOf(','))
$f.Close()
I also noticed this pulls the 1st value of the second line, and I have no idea how to switch it to the 2nd value... it seems to be measuring the width from point 0 to the first comma, and then cutting that. if you change substring from 0 to say 5, it still measures the length of 0 to comma, but then moves where to start grabbing... to the 6th character.
The import-csv I used was :
$data = Import-Csv -LiteralPath "$filePath" |
Select-Object -Skip 1 -First 1 -ExpandProperty 'FileName'
I tested these on a 90 meg csv, with 21 columns, and 284k rows. and "FileName" was the second column

PowerShell clearing memory after finishing

I have a PowerShell script that reads a large CSV file (4GB+), finds certain lines, then writes the lines to other files.
I'm noticing that when it gets to "echo "Processed $datacounter total lines in the $datafile file"" the last line of the script, it doesn't actually finish until 5-10 minutes later.
What is it doing for that period? When it does finish, memory usage drops off significantly. Is there a way to force it to clear memory at the end of the script?
Screenshot of Memory Usage
Screenshot of script timestamps
Here is the final version of my script for reference.
# Get the filename
$datafile = Read-Host "Filename"
$dayofweek = Read-Host "Day of week (IE 1 = Monday, 2 = Tuesday..)"
$campaignWriters = #{}
# Create campaign ID hash table
$campaignByID = #{}
foreach($c in (Import-Csv 'campaigns.txt' -Delimiter '|')) {
foreach($id in ($c.CampaignID -split ' ')) {
$campaignByID[$id] = $c.CampaignName
}
foreach($cname in ($c.CampaignName)) {
$writer = $campaignWriters[$cname] = New-Object IO.StreamWriter($dayofweek + $cname + '_filtered.txt')
if($dayofweek -eq 1) {
$writer.WriteLine("ID1|ID2|ID3|ID4|ID5|ID6|Time|Time-UTC-Sec")
}
}
}
# Display the campaigns
$campaignByID.GetEnumerator() | Sort-Object Value
# Read in data file
$encoding = [Text.Encoding]::GetEncoding('iso-8859-1')
$datareader = New-Object IO.StreamReader($datafile, $encoding)
$datacounter = 0
echo "Starting.."
get-date -Format g
while (!$datareader.EndOfStream) {
$data = $datareader.ReadLine().Split('þ')
# Find the Campaign in the hashtable
$campaignName = $campaignByID[$data[3]]
if($campaignName) {
$writer = $campaignWriters[$campaignName]
# If a campaign name was returned from the hash, add the line using that campaign's writer
$writer.WriteLine(($data[20,3,5,8,12,14,0,19] -join '|'))
}
$datacounter++;
}
$datareader.Close()
foreach ($writer in $campaignWriters.Values) {
$writer.Close()
}
echo "Done!"
get-date -Format g
echo "Processed $datacounter total lines in the $datafile file"

I'm assuming that campaigns.txt is the mult-gigabyte file you are referring to. If it's the other file(s), this might not make as much sense.
If so, invoking import-csv the inside parenthesis then using the foreach statement to iterate through them is what's driving your memory usage so high. A better alternative would be use a PowerShell pipeline to stream records from the file without needing to keep all of them in memory at the same time. You achieve this by changing the foreach statment into a ForEach-Object cmdlet:
Import-Csv 'campaigns.txt' -Delimiter '|' | ForEach-Object {
foreach($id in ($_.CampaignID -split ' ')) {
$campaignByID[$id] = $_.CampaignName
}
}
The .NET garbage collector is optimized cases where the majority of objects are short-lived. Therefor this change should result in a noticeable performance increase, as well as reduced wind-down time at the end.
I advise against forcing garbage collection with [System.GC]::Collect(), the garbage collector knows best when it should run. The reasons for this are complex, if you really want to know details why this is true, Maoni's blog has a wealth of details about garbage collection in the .NET environment.

It may or may not work, but you can try to tell garbage collection to run:
[System.GC]::Collect()
You don't have fine grained control over it though, and it may help to Remove-Variable or set variables to $null for some things before running it so that there aren't references to the data anymore.

Optimizing a script

Info
I've created a script which analyzes the debug logs from Windows DNS Server.
It does the following:
Open debug log using [System.IO.File] class
Perform a regex match on each line
Separate 16 capture groups into different properties inside a custom object
Fills dictionaries and appends to the value of each key to produce statistics
Steps 1 and 2 take the longest. In fact, they take a seemingly endless amount of time, because the file is growing as it is being read.
Problem
Due to the size of the debug log (80,000kb) it takes a very long time.
I believe that my code is fine for smaller text files, but it fails to deal with much larger files.
Code
Here is my code: https://github.com/cetanu/msDnsStats/blob/master/msdnsStats.ps1
Debug log preview
This is what the debug looks like (including the blank lines)
Multiply this by about 100,000,000 and you have my debug log.
21/03/2014 2:20:03 PM 0D0C PACKET 0000000005FCB280 UDP Rcv 202.90.34.177 3709 Q [1001 D NOERROR] A (2)up(13)massrelevance(3)com(0)
21/03/2014 2:20:03 PM 0D0C PACKET 00000000042EB8B0 UDP Rcv 67.215.83.19 097f Q [0000 NOERROR] CNAME (15)manchesterunity(3)org(2)au(0)
21/03/2014 2:20:03 PM 0D0C PACKET 0000000003131170 UDP Rcv 62.36.4.166 a504 Q [0001 D NOERROR] A (3)ekt(4)user(7)net0319(3)com(0)
21/03/2014 2:20:03 PM 0D0C PACKET 00000000089F1FD0 UDP Rcv 80.10.201.71 3e08 Q [1000 NOERROR] A (4)dns1(5)offis(3)com(2)au(0)
Request
I need ways or ideas on how to open and read each line of a file more quickly than what I am doing now.
I am open to suggestions of using a different language.

I would trade this:
$dnslog = [System.IO.File]::Open("c:\dns.log","Open","Read","ReadWrite")
$dnslog_content = New-Object System.IO.StreamReader($dnslog)
For ($i=0;$i -lt $dnslog.length; $i++)
{
$line = $dnslog_content.readline()
if ($line -eq $null) { continue }
# REGEX MATCH EACH LINE OF LOGFILE
$pattern = $line | select-string -pattern $regex
# IGNORE EMPTY MATCH
if ($pattern -eq $null) {
continue
}
for this:
Get-Content 'c:\dns.log' -ReadCount 1000 |
ForEach-Object {
foreach ($line in $_)
{
if ($line -match $regex)
{
#Process matches
}
}
That will reduce then number of file read operations by a factor of 1000.
Trading the select-string operation will require re-factoring the rest of the code to work with $matches[n] instead of $pattern.matches[0].groups[$n].value, but is much faster. Select-String returns matchinfo objects which contain a lot of additional information about the match (line number, filename, etc.) which is great if you need it. If all you need is strings from the captures then it's wasted effort.
You're creating an object ($log), and then accumulating values into array properties:
$log.date += #($pattern.matches[0].groups[$n].value); $n++
that array addition is going to kill your performance. Also, hash table operations are faster than object property updates.
I'd create $log as a hash table first, and the key values as array lists:
$log = #{}
$log.date = New-Object collections.arraylist
Then inside your loop:
$log.date.add($matches[1]) > $nul)
Then create your object from $log after you've populated all of the array lists.

As a general piece of advise, use the Measure-Command to find out which script blocks take the longest time.
That being said, the sleep process seems a bit weird. If I'm not in error, you sleep 20 ms after each row:
sleep -milliseconds 20
Multiply 20 ms with the log size, 100 million iterations, and you'll get quite a long total sleep time.
Try sleeping after some decent batch size. Try if 10 000 rows is good like so,
if($i % 10000 -eq 0) {
write-host -nonewline "."
start-sleep -milliseconds 20
}

How to process a file in PowerShell line-by-line as a stream

I'm working with some multi-gigabyte text files and want to do some stream processing on them using PowerShell. It's simple stuff, just parsing each line and pulling out some data, then storing it in a database.
Unfortunately, get-content | %{ whatever($_) } appears to keep the entire set of lines at this stage of the pipe in memory. It's also surprisingly slow, taking a very long time to actually read it all in.
So my question is two parts:
How can I make it process the stream line by line and not keep the entire thing buffered in memory? I would like to avoid using up several gigs of RAM for this purpose.
How can I make it run faster? PowerShell iterating over a get-content appears to be 100x slower than a C# script.
I'm hoping there's something dumb I'm doing here, like missing a -LineBufferSize parameter or something...

If you are really about to work on multi-gigabyte text files then do not use PowerShell. Even if you find a way to read it faster processing of huge amount of lines will be slow in PowerShell anyway and you cannot avoid this. Even simple loops are expensive, say for 10 million iterations (quite real in your case) we have:
# "empty" loop: takes 10 seconds
measure-command { for($i=0; $i -lt 10000000; ++$i) {} }
# "simple" job, just output: takes 20 seconds
measure-command { for($i=0; $i -lt 10000000; ++$i) { $i } }
# "more real job": 107 seconds
measure-command { for($i=0; $i -lt 10000000; ++$i) { $i.ToString() -match '1' } }
UPDATE: If you are still not scared then try to use the .NET reader:
$reader = [System.IO.File]::OpenText("my.log")
try {
for() {
$line = $reader.ReadLine()
if ($line -eq $null) { break }
# process the line
$line
}
}
finally {
$reader.Close()
}
UPDATE 2
There are comments about possibly better / shorter code. There is nothing wrong with the original code with for and it is not pseudo-code. But the shorter (shortest?) variant of the reading loop is
$reader = [System.IO.File]::OpenText("my.log")
while($null -ne ($line = $reader.ReadLine())) {
$line
}

System.IO.File.ReadLines() is perfect for this scenario. It returns all the lines of a file, but lets you begin iterating over the lines immediately which means it does not have to store the entire contents in memory.
Requires .NET 4.0 or higher.
foreach ($line in [System.IO.File]::ReadLines($filename)) {
# do something with $line
}
http://msdn.microsoft.com/en-us/library/dd383503.aspx

If you want to use straight PowerShell check out the below code.
$content = Get-Content C:\Users\You\Documents\test.txt
foreach ($line in $content)
{
Write-Host $line
}

Timing a command's execution in PowerShell

Is there a simple way to time the execution of a command in PowerShell, like the 'time' command in Linux?
I came up with this:
$s=Get-Date; .\do_something.ps1 ; $e=Get-Date; ($e - $s).TotalSeconds
But I would like something simpler like
time .\do_something.ps1

Yup.
Measure-Command { .\do_something.ps1 }
Note that one minor downside of Measure-Command is that you see no stdout output.
[Update, thanks to #JasonMArcher] You can fix that by piping the command output to some commandlet that writes to the host, e.g. Out-Default so it becomes:
Measure-Command { .\do_something.ps1 | Out-Default }
Another way to see the output would be to use the .NET Stopwatch class like this:
$sw = [Diagnostics.Stopwatch]::StartNew()
.\do_something.ps1
$sw.Stop()
$sw.Elapsed

You can also get the last command from history and subtract its EndExecutionTime from its StartExecutionTime.
.\do_something.ps1
$command = Get-History -Count 1
$command.EndExecutionTime - $command.StartExecutionTime

Use Measure-Command
Example
Measure-Command { <your command here> | Out-Host }
The pipe to Out-Host allows you to see the output of the command, which is
otherwise consumed by Measure-Command.

Simples
function time($block) {
$sw = [Diagnostics.Stopwatch]::StartNew()
&$block
$sw.Stop()
$sw.Elapsed
}
then can use as
time { .\some_command }
You may want to tweak the output

Here's a function I wrote which works similarly to the Unix time command:
function time {
Param(
[Parameter(Mandatory=$true)]
[string]$command,
[switch]$quiet = $false
)
$start = Get-Date
try {
if ( -not $quiet ) {
iex $command | Write-Host
} else {
iex $command > $null
}
} finally {
$(Get-Date) - $start
}
}
Source: https://gist.github.com/bender-the-greatest/741f696d965ed9728dc6287bdd336874

Using Stopwatch and formatting elapsed time:
Function FormatElapsedTime($ts)
{
$elapsedTime = ""
if ( $ts.Minutes -gt 0 )
{
$elapsedTime = [string]::Format( "{0:00} min. {1:00}.{2:00} sec.", $ts.Minutes, $ts.Seconds, $ts.Milliseconds / 10 );
}
else
{
$elapsedTime = [string]::Format( "{0:00}.{1:00} sec.", $ts.Seconds, $ts.Milliseconds / 10 );
}
if ($ts.Hours -eq 0 -and $ts.Minutes -eq 0 -and $ts.Seconds -eq 0)
{
$elapsedTime = [string]::Format("{0:00} ms.", $ts.Milliseconds);
}
if ($ts.Milliseconds -eq 0)
{
$elapsedTime = [string]::Format("{0} ms", $ts.TotalMilliseconds);
}
return $elapsedTime
}
Function StepTimeBlock($step, $block)
{
Write-Host "`r`n*****"
Write-Host $step
Write-Host "`r`n*****"
$sw = [Diagnostics.Stopwatch]::StartNew()
&$block
$sw.Stop()
$time = $sw.Elapsed
$formatTime = FormatElapsedTime $time
Write-Host "`r`n`t=====> $step took $formatTime"
}
Usage Samples
StepTimeBlock ("Publish {0} Reports" -f $Script:ArrayReportsList.Count) {
$Script:ArrayReportsList | % { Publish-Report $WebServiceSSRSRDL $_ $CarpetaReports $CarpetaDataSources $Script:datasourceReport };
}
StepTimeBlock ("My Process") { .\do_something.ps1 }

All the answers so far fall short of the questioner's (and my) desire to time a command by simply adding "time " to the start of the command line. Instead, they all require wrapping the command in brackets ({}) to make a block. Here is a short function that works more like time on Unix:
Function time() {
$command = $args -join ' '
Measure-Command { Invoke-Expression $command | Out-Default }
}

A more PowerShell inspired way to access the value of properties you care about:
$myCommand = .\do_something.ps1
Measure-Command { Invoke-Expression $myCommand } | Select -ExpandProperty Milliseconds
4
As Measure-Command returns a TimeSpan object.
note: The TimeSpan object also has TotalMilliseconds as a double (such as 4.7322 TotalMilliseconds in my case above) which might be useful to you. Just like TotalSeconds, TotalDays, etc.

(measure-commmand{your command}).totalseconds
for instance
(measure-commmand{.\do_something.ps1}).totalseconds

Just a word on drawing (incorrect) conclusions from any of the performance measurement commands referred to in the answers. There are a number of pitfalls that should taken in consideration aside from looking to the bare invocation time of a (custom) function or command.
Sjoemelsoftware
'Sjoemelsoftware' voted Dutch word of the year 2015
Sjoemelen means cheating, and the word sjoemelsoftware came into being due to the Volkswagen emissions scandal. The official definition is "software used to influence test results".
Personally, I think that "Sjoemelsoftware" is not always deliberately created to cheat test results but might originate from accommodating practical situation that are similar to test cases as shown below.
As an example, using the listed performance measurement commands, Language Integrated Query (LINQ)(1), is often qualified as the fasted way to get something done and it often is, but certainly not always! Anybody who measures a speed increase of a factor 40 or more in comparison with native PowerShell commands, is probably incorrectly measuring or drawing an incorrect conclusion.
The point is that some .Net classes (like LINQ) using a lazy evaluation (also referred to as deferred execution(2)). Meaning that when assign an expression to a variable, it almost immediately appears to be done but in fact it didn't process anything yet!
Let presume that you dot-source your . .\Dosomething.ps1 command which has either a PowerShell or a more sophisticated Linq expression (for the ease of explanation, I have directly embedded the expressions directly into the Measure-Command):
$Data = #(1..100000).ForEach{[PSCustomObject]#{Index=$_;Property=(Get-Random)}}
(Measure-Command {
$PowerShell = $Data.Where{$_.Index -eq 12345}
}).totalmilliseconds
864.5237
(Measure-Command {
$Linq = [Linq.Enumerable]::Where($Data, [Func[object,bool]] { param($Item); Return $Item.Index -eq 12345})
}).totalmilliseconds
24.5949
The result appears obvious, the later Linq command is a about 40 times faster than the first PowerShell command. Unfortunately, it is not that simple...
Let's display the results:
PS C:\> $PowerShell
Index Property
----- --------
12345 104123841
PS C:\> $Linq
Index Property
----- --------
12345 104123841
As expected, the results are the same but if you have paid close attention, you will have noticed that it took a lot longer to display the $Linq results then the $PowerShell results.
Let's specifically measure that by just retrieving a property of the resulted object:
PS C:\> (Measure-Command {$PowerShell.Property}).totalmilliseconds
14.8798
PS C:\> (Measure-Command {$Linq.Property}).totalmilliseconds
1360.9435
It took about a factor 90 longer to retrieve a property of the $Linq object then the $PowerShell object and that was just a single object!
Also notice an other pitfall that if you do it again, certain steps might appear a lot faster then before, this is because some of the expressions have been cached.
Bottom line, if you want to compare the performance between two functions, you will need to implement them in your used case, start with a fresh PowerShell session and base your conclusion on the actual performance of the complete solution.
(1) For more background and examples on PowerShell and LINQ, I recommend tihis site: High Performance PowerShell with LINQ
(2) I think there is a minor difference between the two concepts as with lazy evaluation the result is calculated when needed as apposed to deferred execution were the result is calculated when the system is idle

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Understanding performance impact of "write-output" - powershell

Related

Use Get content or Import-CSV to read 1st column in 2nd line in a csv

PowerShell clearing memory after finishing

Optimizing a script

How to process a file in PowerShell line-by-line as a stream

Timing a command's execution in PowerShell

Categories

Resources