How do I refer to an outside alias from inside a piglatin macro? - macros

I have an alias which I want to use in a macro:
foo = ....;
define my_macro (z) returns y {
$y = join $z in id, foo on id;
};
a = my_macro(b);
Alas, I get the error:
Undefined alias: macro_my_macro_foo_0
I can, of course, pass foo as en argument:
define my_macro (foo, z) returns y {
$y = join $z in id, $foo on id;
};
a = my_macro(foo,b);
Is this the right way?
If foo is actually a relatively complicated object, will it be recalculated for each macroexpansion of my_macro?

Yes the second approach is right one, you need to pass the alias as an argument to macro otherwise it will not be visible inside macro.
on the other side, alias defined inside the macro will not be access outside, in-case if you want to access the alias then use this format macro_<my macro_name>_<alias name suffixed with an instance>
I have simulated both the options
1. accessing alias from outside to inside macro(using argument)
2. accessing alias from inside macro to outside (using macro expanded name format)
example
in.txt
a,10,1000
b,20,2000
c,30,3000
in1.txt
10,aaa
20,bbb
30,ccc
Pigscript:
define my_macro (foo,z) returns y {
$y = join $z by g1, $foo by f2;
test = FOREACH $y generate $0,$2;
};
foo = LOAD 'in.txt' USING PigStorage(',') AS (f1,f2,f3);
b = LOAD 'in1.txt' USING PigStorage(',') AS (g1,g2);
C = my_macro(foo,b);
DUMP C;
--DUMP macro_my_macro_test_0;
Output of option1:
DUMP C
(10,aaa,a,10,1000)
(20,bbb,b,20,2000)
(30,ccc,c,30,3000)
Output of option2:
DUMP macro_my_macro_test_0
(10,a)
(20,b)
(30,c)
There are some restrictions in using the macro, like
1. not allowed inside nested for each stmt
2. not allowed to use any grunt commands
3. not allowed to include a user-defined schema
I suggest you to refer the below document link, this will definitely give some better ideas about macros and also how to use inside pig script.
http://pig.apache.org/docs/r0.13.0/cont.html#macros

Related

Dynamic variables, CALLERS, Scalars, and assignment

I recently noticed that that re-initializing dynamic variables does not have the semantics I expected in most cases using assignment (binding works the way I expected it to, however).
Specifically, in this code:
sub g {
my $*i = CALLERS::<$*i> // 0;
my $*a1 = CALLERS::<$*a1> // Array.new;
my #*a2 = CALLERS::<#*a2> // Array.new;
$*i++;
$*a1.push: 'v1';
#*a2.push: 'v2';
dd $*i;
dd $*a1;
dd #*a2;
}
sub f {
my $*i = 0;
my $*a1 = Array.new;
my #*a2 = Array.new;
g; g; g;
}
f
I expected output of 3, ["v1", "v1", "v1"], and ["v2", "v2", "v2"] but instead get 1, $["v1", "v1", "v1"], ["v2"]. Switching to binding solves the issue, so there's no problem I'm trying to solve – but I would very much like to understand why assignment doesn't work here. I notice that a Scalar pointing to an Array works, but a Scalar pointing to an Int doesn't. But in either case, I would have thought that the newly assigned variable would receive the value from CALLERS. What am I missing about the semantics of assignment?
What am I missing about the semantics of assignment?
I think what you're missing, doesn't have anything to do with dynamic variables per se. I think what you're missing is the fact that:
my #a = Array.new;
is basically a noop. Because of the single argument rule, it is the same as:
my #a = ();
which is the same as:
my #a;
So, in your example in sub f, the:
my #*a2 = Array.new;
in is just setting up an empty array in a dynamic variable.
Then in sub g the:
my #*a2 = CALLERS::<#*a2> // Array.new;
is just basically doing (because of the single argument rule):
my #*a2;
and hence you don't see the pushes you've done before, because every call it starts out fresh.
Regarding the value of $*i in sub g: that also is just incrementing a copy of the callers $*i, so the value remains at 1 in every call.
The reason that $*a1 works, is that containerization stops the flattening behaviour of the single argument rule. Observe the difference between:
sub a(+#a) { dd #a }; a [2,3,4]; # [2,3,4]
and:
sub a(+#a) { dd #a }; a $[2,3,4]; # [[2,3,4],]

Why can I not access a variable declared in a macro unless I pass in the name of the variable?

I have this macro:
macro_rules! set_vars {
( $($x:ident),* ) => {
let outer = 42;
$( let $x = outer; )*
}
}
Which expands this invocation:
set_vars!(x, y, z);
into what I expect (from --pretty=expanded):
let outer = 42;
let x = outer;
let y = outer;
let z = outer;
In the subsequent code I can print x, y, and z just fine, but outer seems to be undefined:
error[E0425]: cannot find value `outer` in this scope
--> src/main.rs:11:5
|
11 | outer;
| ^^^^^ not found in this scope
I can access the outer variable if I pass it as an explicit macro parameter.
Is this intentional, something to do with "macro hygiene"? If so, then it would probably make sense to mark such "internal" variables in --pretty=expanded in some special way?
Yes, this is macro hygiene. Identifiers declared within the macro are not available outside of the macro (and vice versa). Rust macros are not C macros (that is, Rust macros are more than glorified text replacement).
See also:
The Little Book of Rust Macros
A Practical Intro to Macros
So, what are hygienic macros anyway?

Call a script with definitions in a function

We have a script that defines values to names similar to #define in c. For example:
script.m:
ERR_NOERROR = 0;
ERR_FATAL = 1;
This script already exists and is used for value replacement when reading data from files.
Now we have a function (or more) that does some analysis and we would like to use the same definition in this function to avoid magic numbers. But when the script is called from the function we get an error.
Attempt to add "ERR_NOERROR" to a static workspace.
See MATLAB Programming, Restrictions on Assigning to Variables for details.
And this does not help much in the understanding of the problem.
The question is how can we make these definitions visible/usable in the functions with having to copying it every time.
Example:
function foo = bar(a)
run(script.m) %also tried running it without the run command
if a == ERR_NOERROR
foo = 5;
else
foo = 6;
end
end
edit:
There was a nested function,below in the function which I was not aware of. This explains the problem.
This kind of scoping error happens when you use nested or anonymous function within a function. The solution is well documented.
To your case, you can avoid nested function, or "Convert the script to a function and pass the variable using arguments", as the documentation suggests.
EDIT: I should have made it clear that the error occurs even if the script is not called within the nested function. Similar scenario is that, in debug mode (by setting up a break point), it will be an error if one tries to create a temporal variable to test something.
This is not a direct answer, rather a recommendation to switch to another method, which will not be mixing scope and workspace.
Instead of defining your constant in a script, you could make a class containing only constant properties. ex: code for error_codes.m:
classdef error_codes
% ---------------------------------------------------------------------
% Constant error code definition
% ---------------------------------------------------------------------
properties (Constant = true)
noerror = 0 ;
fatal = 1 ;
errorlvl2 = 2 ;
errorlvl3 = 3 ;
warning = -1 ;
% etc ...
end
end
I use this style for many different type of constants. For tidiness, I groups them all in a Matlab package directory (The directories which starts with a + character.
The added benefit of using constant class properties is the safety that the values cannot be changed in the middle of the code (your variables defined in a script could easily be overwritten by a careless user).
So assuming my file error_codes.m is placed in a folder:
\...somepath...\+Constants\error_codes.m
and of course the folder +Constants is on the MATLAB path, then to use it as in your example, instead of calling the script, just initialise an instance of the class, then use the constant values when you need them:
function foo = bar(a)
ERR = Constants.error_codes ;
if a == ERR.noerror
foo = 5;
else
foo = 6;
end
or it can works in switch statement too:
switch a
case ERR.noerror
foo = 5 ;
case ERR.warning
foo = 42 ;
case ERR.fatal
foo = [] ;
end

Can I write multiple statements in an anonymous function? [duplicate]

I'd like to do something like this:
>> foo = #() functionCall1() functionCall2()
So that when I said:
>> foo()
It would execute functionCall1() and then execute functionCall2(). (I feel that I need something like the C , operator)
EDIT:
functionCall1 and functionCall2 are not necessarily functions that return values.
Trying to do everything via the command line without saving functions in m-files may be a complicated and messy endeavor, but here's one way I came up with...
First, make your anonymous functions and put their handles in a cell array:
fcn1 = #() ...;
fcn2 = #() ...;
fcn3 = #() ...;
fcnArray = {fcn1 fcn2 fcn3};
...or, if you have functions already defined (like in m-files), place the function handles in a cell array like so:
fcnArray = {#fcn1 #fcn2 #fcn3};
Then you can make a new anonymous function that calls each function in the array using the built-in functions cellfun and feval:
foo = #() cellfun(#feval,fcnArray);
Although funny-looking, it works.
EDIT: If the functions in fcnArray need to be called with input arguments, you would first have to make sure that ALL of the functions in the array require THE SAME number of inputs. In that case, the following example shows how to call the array of functions with one input argument each:
foo = #(x) cellfun(#feval,fcnArray,x);
inArgs = {1 'a' [1 2 3]};
foo(inArgs); %# Passes 1 to fcn1, 'a' to fcn2, and [1 2 3] to fcn3
WORD OF WARNING: The documentation for cellfun states that the order in which the output elements are computed is not specified and should not be relied upon. This means that there are no guarantees that fcn1 gets evaluated before fcn2 or fcn3. If order matters, the above solution shouldn't be used.
The anonymous function syntax in Matlab (like some other languages) only allows a single expression. Furthermore, it has different variable binding semantics (variables which are not in the argument list have their values lexically bound at function creation time, instead of references being bound). This simplicity allows Mathworks to do some optimizations behind the scenes and avoid a lot of messy scoping and object lifetime issues when using them in scripts.
If you are defining this anonymous function within a function (not a script), you can create named inner functions. Inner functions have normal lexical reference binding and allow arbitrary numbers of statements.
function F = createfcn(a,...)
F = #myfunc;
function b = myfunc(...)
a = a+1;
b = a;
end
end
Sometimes you can get away with tricks like gnovice's suggestion.
Be careful about using eval... it's very inefficient (it bypasses the JIT), and Matlab's optimizer can get confused between variables and functions from the outer scope that are used inside the eval expression. It's also hard to debug and/or extent code that uses eval.
Here is a method that will guarantee execution order and, (with modifications mentioned at the end) allows passing different arguments to different functions.
call1 = #(a,b) a();
call12 = #(a,b) call1(b,call1(a,b));
The key is call1 which calls its first argument and ignores its second. call12 calls its first argument and then its second, returning the value from the second. It works because a function cannot be evaluated before its arguments. To create your example, you would write:
foo = #() call12(functionCall1, functionCall2);
Test Code
Here is the test code I used:
>> print1=#()fprintf('1\n');
>> print2=#()fprintf('2\n');
>> call12(print1,print2)
1
2
Calling more functions
To call 3 functions, you could write
call1(print3, call1(print2, call1(print1,print2)));
4 functions:
call1(print4, call1(print3, call1(print2, call1(print1,print2))));
For more functions, continue the nesting pattern.
Passing Arguments
If you need to pass arguments, you can write a version of call1 that takes arguments and then make the obvious modification to call12.
call1arg1 = #(a,arg_a,b) a(arg_a);
call12arg1 = #(a, arg_a, b, arg_b) call1arg1(b, arg_b, call1arg1(a, arg_a, b))
You can also make versions of call1 that take multiple arguments and mix and match them as appropriate.
It is possible, using the curly function which is used to create a comma separated list.
curly = #(x, varargin) x{varargin{:}};
f=#(x)curly({exp(x),log(x)})
[a,b]=f(2)
If functionCall1() and functionCall2() return something and those somethings can be concatenated, then you can do this:
>> foo = #() [functionCall1(), functionCall2()]
or
>> foo = #() [functionCall1(); functionCall2()]
A side effect of this is that foo() will return the concatenation of whatever functionCall1() and functionCall2() return.
I don't know if the execution order of functionCall1() and functionCall2() is guaranteed.

How to modify parent scope variable using Powershell

I'm fairly new to powershell, and I'm just not getting how to modify a variable in a parent scope:
$val = 0
function foo()
{
$val = 10
}
foo
write "The number is: $val"
When I run it I get:
The number is: 0
I would like it to be 10. But powershell is creating a new variable that hides the one in the parent scope.
I've tried these, with no success (as per the documentation):
$script:$val = 10
$global:$val = 10
$script:$val = 10
But these don't even 'compile' so to speak.
What am I missing?
You don't need to use the global scope. A variable with the same name could have been already exist in the shell console and you may update it instead. Use the script scope modifier. When using a scope modifier you don't include the $ sign in the variable name.
$script:val=10
The parent scope can actually be modified directly with Set-Variable -Scope 1 without the need for Script or Global scope usage. Example:
$val = 0
function foo {
Set-Variable -scope 1 -Name "Val" -Value "10"
}
foo
write "The number is: $val"
Returns:
The number is: 10
More information can be found in the Microsoft Docs article About Scopes. The critical excerpt from that doc:
Note: For the cmdlets that use the Scope parameter, you can also refer to scopes by number. The number describes the relative position of one scope to another. Scope 0 represents the current, or local, scope. Scope 1 indicates the immediate parent scope. Scope 2 indicates the parent of the parent scope, and so on. Numbered scopes are useful if you have created many recursive scopes.
Be aware that recursive functions require the scope to be adjusted accordingly:
$val = ,0
function foo {
$b = $val.Count
Set-Variable -Name 'val' -Value ($val + ,$b) -Scope $b
if ($b -lt 10) {
foo
}
}
Let me point out a third alternative, even though the answer has already been made. If you want to change a variable, don't be afraid to pass it by reference and work with it that way.
$val=1
function bar ($lcl)
{
write "In bar(), `$lcl.Value starts as $($lcl.Value)"
$lcl.Value += 9
write "In bar(), `$lcl.Value ends as $($lcl.Value)"
}
$val
bar([REF]$val)
$val
That returns:
1
In bar(), $lcl.Value starts as 1
In bar(), $lcl.Value ends as 10
10
If you want to use this then you could do something like this:
$global:val=0
function foo()
{
$global:val=10
}
foo
write "The number is: $val"
Perhaps the easiest way is to dot source the function:
$val = 0
function foo()
{
$val = 10
}
. foo
write "The number is: $val"
The difference here being that you call foo via . foo.
Dot sourcing runs the function as normal, but it runs inside the parent scope, so there is no child scope. This basically removes scoping. It's only an issue if you start setting or overwriting variables/definitions in the parent scope unintentionally. For small scripts this isn't usually the case, which makes dot sourcing really easy to work with.
If you only want a single variable, then you can use a return value, e.g.,
$val = 0
function foo()
{
return 10
}
$val = foo
write "The number is: $val"
(Or without the return, as it's not necessary in a function)
You can also return multiple values to set multiple variables in this manner, e.g., $a, $b, $c = foo if foo returned 1,2,3.
The above approaches are my preferred way to handle cross-scope variables.
As a last alternative, you can also place the write in the function itself, so it's writing the variable in the same scope as it's being defined.
Of course, you can also use the scope namespace solutions, Set-Variable by scope, or pass it via [ref] as demonstrated by others here. There are many solutions in PowerShell.