The logic of macros - macros

I am new to Stata.
Let's say we have a dataset with train and re78 as variables.
Why does this code work
sum train
local a= r(N)*r(mean)
regress re78 train
outreg2 using TABLE_2.xls, addstat(A, `a') excel
but not this
sum train
local a= r(N)*r(mean)
sum `a'
Both of these codes have the purpose of calling out local variable a.

In Stata the term variable is reserved for columns in the dataset. Local macros are called that, not local variables.
Why does your second code fail? After sum train the local macro a is calculated as r(N) * r(mean) and so should contain the sum or total of values from the last calculation, the application of summarize. (You could also just use r(sum).)
Let's suppose that after that your sum contains 42.
Then
sum `a'
is interpreted as
sum 42
and the problem then is nothing to do with using a local macro. The problem is that summarize has nothing legal to do there. The minimum legal syntax for summarize is to specify variable names or no variable names at all, which is interpreted as meaning all variables. But 42, or whatever your sum is, fits neither syntax and it's illegal.
I am not clear what you want this syntax to do, but it is not legal.

Related

Why do I get different result in different versions of MATLAB (2016 vs 2021)?

Why do I get different results when using the same code running in different version of MATLAB (2016 vs 2021) for sum(b.*x1) where b is single and x1 is double. How to avoid such error between MATLAB version?
MATLAB v.2021:
sum(b.*x1)
ans =
single
-0.0013286
MATLAB 2016
sum(b.*x1)
ans =
single
-0.0013283
In R2017b, they changed the behavior of sum for single-precision floats, and in R2020b they made the same changes for other data types too.
The change speeds up the computation, and improves accuracy by reducing the rounding errors. Simply put, previously the algorithm would just run through the array in sequence, adding up the values. The new behavior computes the sum over smaller portions of the array, and then adds up those results. This is more precise because the running total can become a very large number, and adding smaller numbers to it causes more rounding in those smaller numbers. The speed improvement comes from loop unrolling: the loop now steps over, say, 8 values at the time, and in the loop body, 8 running totals are computed (they don’t specify the number they use, the 8 here is an example).
Thus, your newer result is a better approximation to the sum of your array than the old one.
For more details (a better explanation of the new algorithm and the reason for the change), see this blog post.
Regarding how to avoid the difference: you could implement your own sum function, and use that instead of the builtin one. I would suggest writing it as a MEX-file for efficiency. However, do make sure you match the newer behavior of the builtin sum, as that is the better approximation.
Here is an example of the problem. Let's create an array with N+1 elements, where the first one has a value of N and the rest have a value of 1.
N = 1e8;
a = ones(N+1,1,'single');
a(1) = N;
The sum over this array is expected to be 2*N. If we set N large enough w.r.t. the data type, I see this in R2017a (before the change):
>> sum(a)
ans =
single
150331648
And I see this in R2018b (after the change for single-precision sum):
>> sum(a)
ans =
single
199998976
Both implementations make rounding errors here, but one is obviously much, much closer to the expected result (2e8, or 200000000).

what is the difference between defining a vector using linspace and defining a vector using steps?

i am trying to learn the basics of matlab ,
i wanted to write a mattlab script ,
in this script i defined a vector x with a "d" step that it's length is (2*pi/1000)
and i wanted to plot two sin function according to x :
the first sin is with a frequency of 1, and the second sin frequency 10.3 ..
this is what i did:
d=(2*pi/1000);
x=-pi:d:pi;
first=sin(x);
second=sin(10.3*x);
plot(x,first,x,second);
my question:
what is the different between :
x=linspace(-pi,pi,1000);
and ..
d=(2*pi/1000);
x=-pi:d:pi;
? i am asking because i got confused since i think they both are the same but i think there is something wrong with my assumption ..
also is there is a more sufficient way to write sin function with a giveng frequency ?
The main difference can be summarizes as predefined size vs predefined step. And your example highlights it very well, indeed (1000 elements vs 1001 elements).
The linspace function produces a fixed-length vector (the length being defined by the third input argument, which defaults to 100) whose lower and upper limits are set, respectively, by the first and the second input arguments. The correct step to use is internally computed by the function itself (step = (x2 - x1) / n).
The colon operator defines a vector of elements whose values range between the specified lower and upper limits. The step, which is an optional parameter that defaults to 1, is the discriminant of the vector length. This means that the length of the result is determined by the number of steps that must be accomplished in order to reach the upper limit, starting from the lower one. On an side note, on this MathWorks thread you can find a very interesting discussion concerning the behavior of the colon operator in respect of floating-point management.
Another difference, related to the first one, is that linspace always includes the upper limit value while the colon operator only contains it if the specified step allows it (0:5:14 = [0 5 10]).
As a general rule, I prefer to use the former when I want to produce a vector of a predefined length (pretty obvious, isn't it?), and the latter when I need to create a sequence whose length has only a marginal relevance (or no relevance at all)

How can I specify different ranges for different variables in vpasolve command in MATLAB?

I am solving a set of equations in MATLAB using vpasolve and I would like to give it search ranges for the variables. I know how to do it for one variable but I when I try to give three ranges for three variables, I do not know how it is done. Could anyone help please?
You can specify the allowed variable range using a n x 2 matrix as the last argument, i.e. for initial_guess:
If init_guess is a matrix with two columns, then the two entries of
the rows specify the bounds of a search range for the corresponding
variables. To specify a starting point in a matrix of search ranges,
specify both columns as the starting point value.
So, initial_guess for 3 (=n) input variables would be:
initial_guess = [x1_start x1_end;
x2_start x2_end;
x3_start x3_end;]

Use Sum Indexed and Lambdify, and scipy to minimize a large expression

I have a huge expression around 231 terms and each of these expressions has some power of cos(e) or sin(e) and they can be mixed as well, each term also has an r(distance) term in the denominator raised to some power as well.
Here is a small portion of the expression
What I'd like to do is sum the expression over all angle e's and then over all r's and use lambdify and scipy to minimize the expression with respect to 4 other parameters present in the equation.
Things I tried
I have tried to do the sums using sum indexed in scipy but am not
able to make it work, the power bit is tricky also once I have the
sum indexed expression and I expand it how do i pass the list of
angle values at which to calculate the expression
Also since the expression is pretty large I'd like to do the sum indexing etc. in a loop without individually resolving expression for each power.
(If my question is not clear, let me know.)
This is how I finally managed to solve my problem -
Replaced cos(e) and sin(e) with variable cose and sine.
Iterated over the list of angles and used sympy.subs to replace cose and sine with math.cos(e) and math.sin(e) and kept adding the expressions obtained ditto for r as well.
This left me with only p1 and p2 and Q1 and Q2 which was required.
I couldnt use sympy lambdify and sum indexed but this got the job done.

MATLAB Murphy's HMM Toolbox: Inconsistent Output Sequence and Label Statesname and Symbols

Hi I have been using Murphy's HMM toolbox with output of Gaussian Mixture. In brief, I have 2 datasets for training. Each dataset comprises of 2000 observations with 11 dimensions per observation. I implemented the following steps to observe the path sequence output.
N_states=2
N_Gaussian_Mixture=1
For each of the dataset, a HMM model was generated. The steps are:
Step 1: mixgauss_init() was used to generated GMM signature for my training data.
Step 2: After declaring the matrices for Prior and Transmat, mhmm_em() was used to generate HMM model for the training dataset.
Testing: 2 test data from each of the dataset are used for testing using mhm_logprob(). The output were correctly predicted using loglikelihood scores in every run.
However, when I tried to observe the sequence of the HMM modelling (Dataset_123 with testdata_123) via mixgauss_prob() followed by viterbi_path(), the output sequences were inconsistent. For example, for the first run, the output sequence can be 2221111111111. But when I rerun the program again, the sequence can change to 1111111111111 or 1111111111222. Initially I thought it could be due to my Prior matrix. I fixed the Prior value but it is not helping.
Secondly, it there a possibility when I can assigned labels to the states and sequence? Like Matlab function:
hmmgenerate(...,'Symbols',SYMBOLS) specifies the symbols that are emitted. SYMBOLS can be a numeric array or a cell array of the names of the symbols. The default symbols are integers 1 through N, where N is the number of possible emissions.
`hmmgenerate(...,'Statenames',STATENAMES) specifies the names of the states. STATENAMES can be a numeric array or a cell array of the names of the states. The default state names are 1 through M, where M is the number of states.?
Thank you for your time and hope to hear from the expert sharing.