I have a function string --> string in PowerShell that is quite slow to execute, and I would like to memoize it, that to preserve all the input/output pairs to speed-up an execution that calls this function over and over. I can think of many complicated ways of achieving this. Would anyone have anything not too convoluted to propose?
Well, here's a shot at it. I definitely cannot claim this is a good or efficient method; in fact, even coming up with something that works was tricky and a PowerShell expert might do better.
This is massive overkill in a scripting language, by the way (a global variable is far simpler), so this is definitely more of an intellectual exercise.
function Memoize($func) {
$cachedResults = #{}
{
if (-not $cachedResults.ContainsKey("$args")) {
echo "Remembering $args..." #for illustration
$cachedResults.Add("$args", $func.invoke($args))
}
$cachedResults["$args"]
}.getnewclosure()
}
function add($a, $b) {
return $a + $b;
}
$add = Memoize ${function:add};
&$add 5 4
&$add 5 4
&$add 1 2
I don't know if writing a .NET PowerShell module is practical in your case. If it is, you could use the similar technique as I did with one of my projects.
Steps:
Rewrite your function as a cmdlet
Have a static class with a static hash table where you store the results
Add a -Cache flag to the cmdlet so you can run it with and without the cache (if needed)
My scenario is a little different as I am not using a hash.
My cache: https://github.com/Swoogan/Octopus-Cmdlets/blob/master/Octopus.Extensions/Cache.cs
Usage: https://github.com/Swoogan/Octopus-Cmdlets/blob/master/Octopus.Cmdlets/GetEnvironment.cs
I thought of something pretty simple, like the following:
$array = #{}
function memoized
{
param ([string] $str)
if ($array[$str] -eq $null)
{
$array["$str"] = "something complicated goes on here"
}
return $array[$str]
}
memoized "testing"
Related
I am writing custom cmdlets to perform a task. One of the cmdlets depends on the other ones and is taking a really long time, so I wanted to perform that task inside a job. Well guess what, you can't because the other cmdlets are not available inside that Job's scope. Why? The other languages out there like C++, Java, C# allow you to use variables, objects, functions from whitin the same scope, why isn't this available in PowerShell? Why is it so decoupled? I feel like it makes it harder for developers. Maybe I don't get the topic, but I would like to do something like this:
function Write-Yes {
Write-Host "yes"
}
function Write-No {
Write-Host "no"
}
function Write-Random {
$result = #($true, $false) | Get-Random
if ($result) {
Write-Yes
}
else {
Write-No
}
}
Start-Job -ScriptBlock { Write-Random }
This is not possible. You have to do some hacks like providing the scriptblock of the function as the argument and call it using the call operator or something like this. Or even, use Import-Module to reimport the same file that you are working in. This feels overly complicated. The only module that I saw that is able to do something like this is PoshRSJob that allows you to name the cmdlets that will be used inside the job and it will create them dynamically for you, with some lexical parsing and again, overly complicated things.
Why are the things like they are and is there any way to do what I'm trying in the example in an elegat way?
Start-Job is very limited and slow. It was implemented very poorly imho and I never use it. You can use runspaces for fast and lightweight "background jobs", and import functions and variables from your current session.
Example:
function Write-Yes { "yes" }
function Write-No { "no" }
function Write-Random {
if ($true, $false | Get-Random) {
Write-Yes
}
else {
Write-No
}
}
# setup session and import functions
$session = [System.Management.Automation.Runspaces.InitialSessionState]::CreateDefault()
"Write-Yes", "Write-No", "Write-Random" | foreach {
$session.Commands.Add((
New-Object System.Management.Automation.Runspaces.SessionStateFunctionEntry $_, (Get-Content "Function:\$_")
))
}
# setup separate powershell instance
$job = [Powershell]::Create($session)
[void]$job.AddScript({ Write-Random })
# start async
$asyncResult = $job.BeginInvoke()
# do stuff ...
# wait for completion
$job.EndInvoke($asyncResult)
$job.Dispose()
But in general, Powershell is not made for complex parallel processing. In general, it's best to put everything inside a script file and run that, as a task or background job etc.
I haven't used Perl for around 20 years, and this is confusing me. I've g******d for it, but I obviously haven't used a suitable search string because I haven't found anything relating to this...
Why would I want to do the following? I understand what it's doing, but the "why" escapes me. Why not just return 0 or 1 to begin with?
I'm working on some code where a sub uses "return sub"; here's a very truncated example e.g.
sub test1 {
$a = shift #_;
if ($a eq "cat") {
return sub {
print("cat test OK\n");
return 0;
}
}
# default if "cat" wasn't the argument
return sub {
print("test for cat did not work\n");
return 1;
}
}
$c = test1("cat");
print ("received: $c\n");
print ("value is: ",&$c,"\n");
$c = test1("bat");
print ("received: $c\n");
print ("value is: ",&$c,"\n");
In your code there is no reason to return a sub. However, with a little tweak
sub test1 {
my $animal = shift #_;
if ($animal eq "cat" || $animal eq "dog") {
return sub {
print("$animal test OK\n");
return 0;
};
}
# default if "cat" or "dog" wasn't the argument
return sub {
print("test for cat or dog did not work\n");
return 1;
};
}
We now have a closure around $animal this saves memory as the test for cat and dog share the same code. Note that this only works with my variables. Also note that $a and $b are slightly special to Perl, they are used in the block of code that you can pass to the sort function and bypass some of the checks on visibility so it's best to avoid them for anything except sort.
You probably want to search "perl closures".
There are many reasons that you'd want to return a code reference, but it's not something I can shortly answer in a StackOverflow question. Mark Jason Dominus's Higher Order Perl is a good way to expand your mind, and we cover a little of that in Intermediate Perl.
I wrote File::Find::Closures as a way to demonstrate this is class. Each subroutine in that module returns two code references—one for the callback to File::Find and the other as a way to access the results. The two share a common variable which nothing else can access.
Notice in your case, you aren't merely calling a subroutine to "get a zero". It's doing other things. Even in your simple example there's some output. Some behavior is then deferred until you actually use the result for something.
Having said that, we have no chance of understanding why the programmer who wrote your particular code did it. One plausible guess was that the system was set up for more complex situations and you're looking at a trivial example that fits into that. Another plausible guess was that the programmer just learned that feature and fell in love with it then used it everywhere for a year. There's always a reason, but that doesn't mean there's always a good reason.
When the following code could have side effects?
#some = map { s/xxx/y/; $_ } #some;
The perlcritic explains it as dangerous, because for example:
#other = map { s/xxx/y/; $_ } #some;
and the members of the #some got also modified. Understand. I have the BPB book, and it shows the above with the example
#pm_files_without_pl_files
= grep { s/.pm\z/.pl/xms && !-e } #pm_files;
and also I read the chapter "List Processing Side Effects" / "Never modify $_ in a list function." and its followers. Also i know the /r.
To be clear (as much is possible with my terrible english):
In the 1st example the main point is modifying the original #some.
The question is about:
could the 1st example #some = map { s/xxx/y/; $_ } #some; causing some unwanted side-effects? If yes, when?
or it is just the "not recommented" way (but harmless otherwise)?
Looking for an answer what goes a bit deeper as some "perl beginner's book" - therefore still doesn't accepted any current answer. ;)
One of the mottos of perl has always been TIMTOWTDI: there is more than one way to do it. If two ways have the same end result, they're equally correct. That doesn't mean there aren't reasons to prefer one way over the other.
In the first case, it would be more obvious (to me, YMMV) to do something like
s/xxx/y/ for #some;
This is mainly because it's communicating intend better. for suggests it's all about the side effect, whereas map suggests it's about the return value. While functionally identical, this should be much easier to understand for your fellow programmer (and probably for yourself in 6 months from now).
There's more than one way, but some are better than others.
Code like your example:
#some = map { s/xxx/y/; $_ } #some;
should be avoided because it's redundant and confusing. It looks like the assignment on the left should be doing something, even though it's actually a no-op. Indeed, just writing:
map { s/xxx/y/; $_ } #some;
would have the exact same effect, as would:
map { s/xxx/y/ } #some;
This version at least has the virtue of making it (reasonably) clear that the return value of map is being ignored, and that the actual purpose of the statement is to modify #some in place.
But of course, as Leon has already pointed out, by far the clearest and most idiomatic way of writing this would be:
s/xxx/y/ for #some;
#some = map { s/xxx/y/; $_ } #some;
will work fine, but it's very poor code because it's not obvious that you're effectively doing
map { s/xxx/y/ } #some; #some = #some;
This already shows you could simply have done
map { s/xxx/y/ } #some;
But that's a misleading and inefficient version of
s/xxx/y/ for #some;
It's all about readability and maintainability.
Note that you can do
use List::MoreUtils qw( apply );
#some = apply { s/xxx/y/ } #some;
And in Perl 5.14+,
#some = map { s/xxx/y/r } #some;
User cashfoley has posted what appears to be a fairly elegant set of code at codeplex for a "module" called PSClass.
When I dot-source the psclass code into some code of my own, I am able to write code like:
$Animal = New-PSClass Animal {
constructor {
param( $name, $legs )
# ...
}
method -override ToString {
"A $($this.Class.ClassName) named $($this.name) with $($this.Legs) Legs"
}
}
When I tried to create a module out of the PSClass code, however, I started getting errors. The constructor and method names are no longer recognized.
Looking at the actual implementation, what I see is that constructor, method, etc. are actually nested functions inside the New-PSClass function.
Thus, it seems to me that when I dot-source the PSClass.ps1 file, my script-blocks are allowed to contain references to functions nested inside other local functions. But when the PSClass code becomes a module, with the New-PSClass function exported (I tried both using a manifest and using Export-ModuleMember), the names are no longer visible.
Can someone explain to me how the script blocks, scoping rules, and visibility rules for nested functions work together?
Also, kind of separately, is there a better class definition protocol for pure Powershell scripting? (Specifically, one that does not involve "just write it in C# and then do this...")
The variables in your script blocks don't get evaluated until they are executed. If the variables in the script block don't exist in the current scope when the block is executed, the variables won't have any values. Script blocks aren't closures: they don't capture the context at instantiation time.
Remove-variable FooBar
function New-ScriptBlock
{
$FooBar = 1
$scriptBlock = {
Write-Host "FooBar: $FooBar"
}
$FooBar = 2
& $scriptBlock # Outputs FooBar: 2 because $FooBar was set to 2 before invocation
return $scriptBlock
}
function Invoke-ScriptBlock
{
param(
$ScriptBlock
)
& $ScriptBlock
}
$scriptBlock = New-ScriptBlock
& $scriptBlock # Prints nothing since $FooBar doesn't exist in this scope
$FooBar = 3
Invoke-ScriptBlock $scriptBlock # Prints $FooBar: 3 since FooBar set to 3
I have a simple function that creates a generic List:
function test()
{
$genericType = [Type] "System.Collections.Generic.List``1"
[type[]] $typedParameters = ,"System.String"
$closedType = $genericType.MakeGenericType($typedParameters)
[Activator]::CreateInstance($closedType)
}
$a = test
The problem is that $a is always null no matter what I try. If I execute the same code outside of the function it works properly.
Thoughts?
IMHO that's pitfall #1. If you return an object from the function that is somehow enumerable (I don't know exactly if implementing IEnumerable is the only case), PowerShell unrolls the object and returns the items in that.
Your newly created list was empty, so nothing was returned. To make it work just use this:
,[Activator]::CreateInstance($closedType)
That will make an one item array that gets unrolled and the item (the generic list) is assigned to $a.
Further info
Here is list of similar question that will help you to understand what's going on:
Powershell pitfalls
Avoiding Agnostic Jagged Array Flattening in Powershell
Strange behavior in PowerShell function returning DataSet/DataTable
What determines whether the Powershell pipeline will unroll a collection?
Note: you dont need to declare the function header with parenthesis. If you need to add parameters, the function will look like this:
function test {
param($myParameter, $myParameter2)
}
or
function {
param(
[Parameter(Mandatory=true, Position=0)]$myParameter,
... again $myParameter2)
...
An easier way to work with generics. This does not directly solve the [Activator] approach though
Function test
{
New-Object "system.collections.generic.list[string]"
}
(test).gettype()