I have to revamp a SAS project in which there are macro variables such as the following:
%let myDate = intnx('month',today(),-1);
and later...
data temp;
a = &myDate;
run;
I'm inclined to use %sysfunc instead:
%let myDate = %sysfunc(intnx(month,%sysfunc(today()),-1));
But I'm wondering... is this a matter of preference or is there some sound reason to prefer one method over the other?
For extremely large datasets, you might find that with this particular example, calling intnx once for every row to get the same value leads to poor performance vs. defining a macro variable once and re-using it indefinitely.
Rob Penridge has demonstrated that in this instance the overhead is likely to be negligible, but for more computationally intensive code this will obviously not always be the case. Your mileage may vary.
More generally, when you start storing code rather than just constants within macro variables, you have to start thinking quite carefully about certain things:
What sort of macro quoting might or might not be required (particularly in a SAS/CONNECT environment when you need to rsubmit blocks of code using macro variables)
Whether or not your macro variable contains semicolons or other characters that might cause unexpected interactions with other blocks of code
Whether it contains macro references that you don't want to resolve until the code executes
You also need to consider the question of timing. If you use %sysfunc() then the function runs when the macro variable is created. If you just store the function call in the macro variable then the function actually does not execute until the data step is running. And in this case since it is calling the today() function it will run for every observation. If you start your data step just before midnight on the last day of the month you could end up with different values of A on different observations in the same dataset.
These are two different things that you are talking about. The first code with %LET and DATA step will create a macro variable myDate without executing the INTNX function, but will create a Table with column a
But, the second revamped %LET statement will actually create just a macro variable with value of INTNX function executed.
So, it actually depends on what the business requirement is -
To create a Table
OR
create a Macro variable which can be created once and used used over and over again.
It's a matter of personal preference. Personally I like to use the %sysfunc approach as it allows me to debug/print the result without having to process any datasteps.
Really I'd say whatever allows for better readability and maintainability. If you're working with people that hate using macros then consider using the simpler first approach. Horses for courses.
Related
I'm updating a program that uses Hash tables. And the program has a lot of repeated variables that are changed on a regular basis.
I've been updating them to reference a program containing macro code for these variables so we don't have to change them one by one, and it's been working like a charm in all my other projects, but I'm struggling with these hash tables
eg
%let year = x2020;
data acute &year.;
if 0 then set a_&syear.;
if _n_=1 then do;
declare has Tx(dataset:"Ty");
Tx.defineData("city","&year.");
Tx.define();
call missing(city,&year.);
end;
....
run;
I've narrowed it down to the use of &year. in the Tx.defineData line.
It doesn't seem to be picking the macro inside the quotations, and am given this error:
undeclared data symbol &year. for has object
though I usually don't have issues with macros inside quotes.
I've tried changing the let function to %let year = "x2020"; and using dequote() for areas that don't need the quotations, I've also tried using quote(&year.) instead but get
undeclared data symbol" ." for hash object....
Has anyone worked out a way to use a let macro in this scenario?
Your main problem is,
you are trying to use names of datasets and fields that start with a digit. Names in SAS should start with a letter or an underscore and contain only letters, underscores and digits. If not, they should be written differently.
Write "name with blanks"n to create a variable or dataset named name with blanks
Write "1_2_3"n to create a variable or dataset named name with blanks
Now that works, but I advice you not to do so, because the syntax becomes quite complex.
Your code is incomplete and contains a lot of typo's.
Therefore, I have to guess what you actually wanted to do.
It would help if you cut and paste it for us, or better, cut and paste the log, so we know exactly what you did.
The data step
I assume with data acute &year.;, you only wanted to create one dataset, named acute 2020. If so, you should have written data "acute &year."n;, but I actually advice you to rename your dataset and write data acute_&year.;.
You might also have wanted to create datasets acute and 2020 in one datastep. Then you should have written data acute "&year"n.;
The declaration of the hash table
First, it is not declare has, but declare hash and not Tx.define(); but Tx.defineDONE();.
If with Tx.defineData("city","&year."); you wanted to specifies the fields city and 2020 should be used as data, that should work, because here you specify variable names as strings, not as SAS names.
The error is actually in call missing(city,&year.); Here you should use the special syntax call missing(city,"&year."n);
Again, I advice you to rename your variable, for instance to _2020, so you can just write it as _&year.
I am defining a variable in the beginning of my source code in MATLAB. Now I would like to know at which lines this variable effects something. In other words, I would like to see all lines in which that variable is read out. This wish does not only include all accesses in the current function, but also possible accesses in sub-functions that use this variable as an input argument. In this way, I can see in a quick way where my change of this variable takes any influence.
Is there any possibility to do so in MATLAB? A graphical marking of the corresponding lines would be nice but a command line output might be even more practical.
You may always use "Find Files" to search for a certain keyword or expression. In my R2012a/Windows version is in Edit > Find Files..., with the keyboard shortcut [CTRL] + [SHIFT] + [F].
The result will be a list of lines where the searched string is found, in all the files found in the specified folder. Please check out the options in the search dialog for more details and flexibility.
Later edit: thanks to #zinjaai, I noticed that #tc88 required that this tool should track the effect of the name of the variable inside the functions/subfunctions. I think this is:
very difficult to achieve. The problem of running trough all the possible values and branching on every possible conditional expression is... well is hard. I think is halting-problem-hard.
in 90% of the case the assumption that the output of a function is influenced by the input is true. But the input and the output are part of the same statement (assigning the result of a function) so looking for where the variable is used as argument should suffice to identify what output variables are affected..
There are perverse cases where functions will alter arguments that are handle-type (because the argument is not copied, but referenced). This side-effect will break the assumption 2, and is one of the main reasons why 1. Outlining the cases when these side effects take place is again, hard, and is better to assume that all of them are modified.
Some other cases are inherently undecidable, because they don't depend on the computer states, but on the state of the "outside world". Example: suppose one calls uigetfile. The function returns a char type when the user selects a file, and a double type for the case when the user chooses not to select a file. Obviously the two cases will be treated differently. How could you know which variables are created/modified before the user deciding?
In conclusion: I think that human intuition, plus the MATLAB Debugger (for run time), and the Find Files (for quick search where a variable is used) and depfun (for quick identification of function dependence) is way cheaper. But I would like to be wrong. :-)
As documented, using run on a string runs into problems when local variables (including procedure parameters) are involved. So, what is the recommended way to achieve the following goal?
I have parameter tables (p1, p2, etc) specifying values for global variables, by name (i.e., table keys are strings, corresponding to the names of the global variables). Given a name (as a string) and a parameter table, I want to set the named global variable to the table value. E.g., if we were to use run, and if the values were all numbers, we might do something like this:
to update [#nm #tbl]
let %tval table:get #tbl #nm
run (word "set " #nm " " %tval)
end
What is the recommended way to do this, avoiding strings (due to the warning in the docs)?
As an additional complication, some table values may be tasks.
Extension of the question:
Following up on my coment of Oct 9, I found that if I isolate the assignment to a procedure, I can also successively make global assignments with tasks. E.g.,
to setGlobalTasks [#name #table]
;; #name : string, name of global variable
;; #table : table, maps names (strings) to values (reporter tasks)
let %tval table:get #table #name
run (word "set " #name " %tval")
end
Seth has provided some assurance that proceeding this way will continue to work in NetLogo when the assigned values are numbers. Is this going to be risky when the assigned values are tasks? Does it pose any risks in NetLogo 5.1?
Note: probably this extension of the question should be in the comments, but I could not format code blocks in a comment.
Your original approach, where %tval is outside of double quotes, only works with table values that are ordinary values like numbers, lists, or strings, values that can survive a round trip to string and back. (If you had trouble in practice, my guess would be that run got confused when you tried to combine it with foreach, as in the code you posted at http://groups.google.com/forum/#!topic/netlogo-devel/m5rnPEsxR44 . I believe this can be worked around by writing a standalone procedure like the one in your question.)
Your revised code, where %tval is inside the double quotes, and the whole thing is isolated in a separate procedure, is correct and should work for all possible table values. It should work fine in both NetLogo 5.0 and 5.1, and almost certainly in 6.0 too if there ever is a 6.0.
(In Tortoise, it wouldn't work, since run probably won't support strings at all in Tortoise.)
Tangent on reflection:
run on strings is kinda ugly. In situations where you want "reflective" setting of variables by name, where the name is stored in a string computed at runtime, it would be nice if there were an extension that supported this directly. The good news is, the code for the extension would be quite short and simple. The required methods within NetLogo, that the extension would call, already exist. The bad news is, writing a NetLogo extension of any kind is only easy if you're comfortable writing and compiling simple Java (or Scala) code. Anyway, it would be great if such a "reflection" extension existed. But in the meantime, you're safe with what you have.
I have a rather bulky program that I've been running as a script from the MATLAB command line. I decided to clean it up a bit with some nested functions (I need to keep everything in one file), but in order for that to work it required me to also make the program itself a function. As a result, the program no longer runs in the base workspace like it did when it was a script. This means I no longer have access to the dozens of useful variables that used to remain after the program runs, which are important for extra calculations and information about the run.
The suggested workarounds I can find are to use assignin, evalin, define the variables as global, or set the output in the definition of the now function-ized program. None of these solutions appeal to me, however, and I would really like to find a way to force the workspace itself to base. Does any such workaround exist? Or is there any other way to do this that doesn't require me to manually define or label each specific variable I want to get out of the function?
Functions should define clearly input and output variables. Organizing the code differently will be much more difficult to understand and to modify later on. In the end, it will most likely cost you more time to work with an unorthodox style than investing in some restructuring.
If you have a huge number of output variables, I would suggest organizing them in structure arrays, which might be easy to handle as output variables.
The only untidy workaround I can imagine would use whos, assignin and eval:
function your_function()
x = 'hello' ;
y = 'world' ;
variables = whos ;
for k=1:length(variables)
assignin('base',variables(k).name,eval(variables(k).name))
end
end
But I doubt that this will help with the aim to clean up your program. As mentioned above I suggest ordering things manually in structures:
function out = your_function()
x = 'hello' ;
y = 'world' ;
out.x = x ;
out.y = y ;
end
If the function you would like to define are simple and have a single output, one option is to use anonymous functions.
Another option is to store all the variable you would like to use afterwards in a struct and have your big function return this struct as an output.
function AllVariables = GlobalFunction(varargin);
% bunch of stuff
AllVariables= struct('Variable1', Variable1, 'Variable2', Variable2, …);
end
I was encountered with usage of function eval(expression) in somebody else's code in matlab:
for example:
for n = 1 : 4
sn = int2str( n) ;
eval( [ 'saveas( fig' sn ', [ sName' sn ' ], ''fig'' ) ' ] );
end
MathWorks stuff in Matlab Help pointed out that:
Many common uses of the eval function are less efficient and are more difficult to read and debug than other MATLAB functions and language constructs.
After this, I find usage of this function in many other program languages, so as Python, JavaScript, PHP.
So I have a few questions:
Will usage of this function be enfluenced on the performance of my code?
If it will slow down execution, why does it occur?
If it slow down execution every time when called, what reason for use this function in principle?
The eval function is dangerous and there are very few examples where you actually need it. For example, the code you gave can easily be rewritten if you store the figure handles in an array fig(1), fig(2) etc and write
for n = 1:4
filename = sprintf('sName%d', n);
saveas(fig(n), filename, 'fig');
end
which is clearer, uses fewer characters, can be analysed by the Matlab editor's linter, is more easily modifiable if (when) you need to extend the code, and is less prone to weird bugs.
As a rule of thumb, you should never use eval in any language unless you really know what you are doing (i.e. you are writing a complicated Lisp macro or something else equivalent to manipulating the AST of the language - if you don't know what that means, you probably don't need to use eval).
There are almost always clearer, more efficient and less dangerous ways to achieve the same result. Often, a call to eval can be replaced with some form of recursion, a higher-order function or a loop.
Using eval here will certainly be slower than a non-eval version, but most likely it won't be a bottleneck in you code. However, the performance is only one issue, maintenance (incl. debugging), as well as readability are other ones.
The slowdown occurs because Matlab uses a JIT compiler, and eval lines cannot be optimized.
Eval use is in most cases due to lack of knowledge about the Matlab functionality that would be appropriate instead. In this particular case, the issue is that the figure handles are stored in variable names called fig1 through fig4. If they had been stored in an array called fig, i.e. fig(1) etc, eval would have been unnecessary.
EDIT Here are two excellent articles by Loren Shure about why eval should be avoided in Matlab. Evading eval, and More on eval.
For the most part, the slowdown occurs because the string has to be parsed into actual code. This isn't such a major issue if used sparingly, but if you ever find yourself using it in code that loops (either an explicit loop or things like JavaScript's setInterval()) then you're in for a major performance drop.
Common uses I've seen for eval that could be done better are:
Accessing property names in the ignorance of [] notation
Calling functions based on an argument name, which could instead be done with a switch (safer, prevents risk of code injection)
Accessing variables named like var1, var2, var3 when they should be arrays instead
To be honest, I don't think I've ever encountered a single situation where eval is the only way to solve a problem. I guess you could liken it to goto in that it's a substitute for program structure, or useful as a temporary solution to test a program before spending the time making it work optimally.
Here is another implication:
When you compile a program that uses eval, you must put pragmas that tell the compiler that some functions are needed. For example:
This code will compile and run well:
foo()
But this one needs a pragma added:
%#function foo
eval('foo()')
Otherwise you will encounter a runtime problem.