Stata - Marginal effects - margins

is there anybody who could briefly go through the program mymargin and check for any inconsistencies and possible extensions/improvements, ... The program should estimate the marginal effect of any variable in the sample selected by if and in should build on the saved results from a previous logit regression.
capture program drop mymargin
program mymargin, rclass
version 10.1
syntax varlist(max=1 numeric) [if] [in] [, atmean]
marksample touse
quietly count if `touse' == 1
if `r(N)' == 0 {
error 301
}
local res me mme
tempname `res'
quietly summarize `varlist' if `touse'==1, detail
scalar `me' = // TBU
display as txt "Marginal effect of" `var' ": " `me'
if "`atmean'" == "atmean" {
matrix r = r(stats)
scalar `me' = normalden(_b[sex]*r[2,1]+_b[ageinyears]*r[3,1] +_b[meduc]*r[4,1] +_b[hhinc]*r[5,1] +_b[area]*r[6,1] + _b[_cons])
}
display as txt "Mean marginal effect of" `var' ": " `me'
foreach r of local res {
return scalar `r' = ``r''
}
return scalar N = r(N) // return the number of observations
return local var `varlist' // return the name of the variable
return // TBU, return the (mean) marginal effect
end
mymargin hhinc
return list
display me
display mme

This would get a failing grade from me, if only because of the wired-in assumptions that you have five specified predictors in a specified order:
scalar `me' = normalden(_b[sex]*r[2,1]+_b[ageinyears]*r[3,1] +_b[meduc]*r[4,1] +_b[hhinc]*r[5,1] +_b[area]*r[6,1] + _b[_cons])
Also, the summarize command will zap any r(stats) left behind by any previous command.
Regardless of Stack Overflow's general policy, that's as far as I will go personally in helping with what appears to be an assignment for a course. See your previous post about a week ago.

Related

Huffman Encoding: How to find the path?

I have a tree with the most frequent letters at the top of the tree and I am trying to find the path to a node so I can turn a message into binary. In this program if I go left, I add 0 to path and, if I go right, I add 1 to the path until I find the node. But I have to go straight to the desired node which is not possible. The only thing I could think of is removing the last character or path if a node has no children, but it does not work if a node has grandchildren. Can someone help me on how to approach this? Thanks!
// global variables
String path;
int mark;
// encodes a message to binary
String encoder(char data) {
path = "";
mark = 0;
findPath(root, data);
return path;
}
// finds the path to a node
void findPath(TNode node, char data) {
if(node.data == data) {
mark = 1;
return;
}
if(mark==0 && node.left != null) {
path += 0;
findPath(node.left, data);
}
if(mark==0 && node.right != null) {
path += 1;
findPath(node.right, data);
}
if(mark==0 && node.left == null || node.right == null) {
path = path.substring(0, path.length() - 1);
}
}
This is not how Huffman coding works. Not at all. You assign a code to every symbol, and symbols can have different lengths.
An algorithm to determine the code: Take two symbols with the lowest frequency, say an and b. Combine them into one symbol x with higher frequency obviously, and when we are finished the code for a is the code for x plus a zero, and the code for b is the code for x plus a one. Then you look again for the two symbols with lowest frequency and combine them and so on. When you are down to two symbols you give them codes 0 and 1 and find all the other codes for symbols.
Example: a, b, c, d with frequencies 3, 47, 2 and 109. Combine a and c to x with frequency 5. Combine x and b to y with frequency 52. Then a = code 0, y = code 1. x = code 10 and b = code 11. Then a = code 100 and c = code 101.
You would not encode messages using the tree. You should instead traverse the entire tree once recursively, generating all of the codes for all of the symbols, and make a table of the symbols and their associated codes. Then you use that table to encode your messages.

Try and Except statement - Automate The Boring Stuff {collatz() program}

I have been trying to complete a task from automate the boring stuff.
This is the task."Write a function named collatz() that has one parameter named number. If number is even, then collatz() should print number // 2 and return this value. If number is odd, then collatz() should print and return 3 * number + 1.
Then write a program that lets the user type in an integer and that keeps calling collatz() on that number until the function returns the value 1.Add try and except statements to the previous project to detect whether the user types in a noninteger string."
def collatz(num):
ev_odd = num % 2 #detecs whether number is odd or even
if ev_odd == 1:
num = num * 3 + 1
else:
num = num//2
print(num)
global number
number = num
#the program
print('enter an integer')
number = int(input())
collatz(number)
while number != 1:
collatz(number)
i made this code it is working fine.But I am unable to use try and except statement to this..Help me out. Other recommendation to improve this code are requested.
Regards

Return " i " value when if-statement is true

I have created a matrix with row 1 full of strings and 4 other rows with numbers. They are created in a handle class with the object "Projekter".
So in the object "Projekter" row 1, the first value is blank, but the second value is 'Ole'. So I know that 'Ole' is in (1,2). x is the name/string I want to search for, which in this case is 'Ole'.
As you see below it should search row 1 from column 2 untill the last name/string and if i = 'Ole', it should bring me the value 2 because " i " should be equal 2.
A is just a controller if the function works, but at this point it doesn't.
The error it gives is "Undefined function 'eq' for input arguments of type 'cell'."
How do I fix this so it return the " i " value when the statement is correct?
Thank you in advance!
function number(obj,x)
A = [];
for i = 2:size(obj.Projekter,2)
if obj.Projekter(1,i)==x
A = A + 1;
end
end
disp(A)
end
Maybe you have to index the cell content:
your_cell = {'a_string'};
your_string = your_cell{1};
function [returnValue] = number(obj,x)
for i = 2:size(obj.Projekter,2)
if obj.Projekter{1,i}==x
returnValue = i;
return;
end
end
end
Note the change from obj.Projekter(1,i)==x to obj.Projekter{1,i}==x (use curly braces instead of parens). I have then specified that returnValue will hold the value that should be returned by doing function [returnValue] = number(obj,x). We then set returnValue equal to i and return from the function when the condition of the if statement is true.
As suggested in the comments, it is probably better to do:
function [returnValue] = number(obj, x)
returnValue = find(strcmp(x, obj.Projekter) == 1);
strcmp(x, obj.Projektor) will give you an array the length of obj.Projekter with 1's wherever the strings match, and 0's where they don't, you can then find the indices that are set to 1. This has the added benefit of
not using a loop so it's faster
Giving you every occurrence of a match, not just the first one.

Convert matlab symbol to array of products

Can I convert a symbol that is a product of products into an array of products?
I tried to do something like this:
syms A B C D;
D = A*B*C;
factor(D);
but it doesn't factor it out (mostly because that isn't what factor is designed to do).
ans =
A*B*C
I need it to work if A B or C is replaced with any arbitrarily complicated parenthesized function, and it would be nice to do it without knowing what variables are in the function.
For example (all variables are symbolic):
D = x*(x-1)*(cos(z) + n);
factoring_function(D);
should be:
[x, x-1, (cos(z) + n)]
It seems like a string parsing problem, but I'm not confident that I can convert back to symbolic variables afterwards (also, string parsing in matlab sounds really tedious).
Thank you!
Use regexp on the string to split based on *:
>> str = 'x*(x-1)*(cos(z) + n)';
>> factors_str = regexp(str, '\*', 'split')
factors_str =
'x' '(x-1)' '(cos(z) + n)'
The result factor_str is a cell array of strings. To convert to a cell array of sym objects, use
N = numel(factors_str);
factors = cell(1,N); %// each cell will hold a sym factor
for n = 1:N
factors{n} = sym(factors_str{n});
end
I ended up writing the code to do this in python using sympy. I think I'm going to port the matlab code over to python because it is a more preferred language for me. I'm not claiming this is fast, but it serves my purposes.
# Factors a sum of products function that is first order with respect to all symbolic variables
# into a reduced form using products of sums whenever possible.
# #params orig_exp A symbolic expression to be simplified
# #params depth Used to control indenting for printing
# #params verbose Whether to print or not
def factored(orig_exp, depth = 0, verbose = False):
# Prevents sympy from doing any additional factoring
exp = expand(orig_exp)
if verbose: tabs = '\t'*depth
terms = []
# Break up the added terms
while(exp != 0):
my_atoms = symvar(exp)
if verbose:
print tabs,"The expression is",exp
print tabs,my_atoms, len(my_atoms)
# There is nothing to sort out, only one term left
if len(my_atoms) <= 1:
terms.append((exp, 1))
break
(c,v) = collect_terms(exp, my_atoms[0])
# Makes sure it doesn't factor anything extra out
exp = expand(c[1])
if verbose:
print tabs, "Collecting", my_atoms[0], "terms."
print tabs,'Seperated terms with ',v[0], ', (',c[0],')'
# Factor the leftovers and recombine
c[0] = factored(c[0], depth + 1)
terms.append((v[0], c[0]))
# Combines trivial terms whenever possible
i=0
def termParser(thing): return str(thing[1])
terms = sorted(terms, key = termParser)
while i<len(terms)-1:
if equals(terms[i][1], terms[i+1][1]):
terms[i] = (terms[i][0]+terms[i+1][0], terms[i][1])
del terms[i+1]
else:
i += 1
recombine = sum([terms[i][0]*terms[i][1] for i in range(len(terms))])
return simplify(recombine, ratio = 1)

Best way to create generic/method consistency for sort.data.frame?

I've finally decided to put the sort.data.frame method that's floating around the internet into an R package. It just gets requested too much to be left to an ad hoc method of distribution.
However, it's written with arguments that make it incompatible with the generic sort function:
sort(x,decreasing,...)
sort.data.frame(form,dat)
If I change sort.data.frame to take decreasing as an argument as in sort.data.frame(form,decreasing,dat) and discard decreasing, then it loses its simplicity because you'll always have to specify dat= and can't really use positional arguments. If I add it to the end as in sort.data.frame(form,dat,decreasing), then the order doesn't match with the generic function. If I hope that decreasing gets caught up in the dots `sort.data.frame(form,dat,...), then when using position-based matching I believe the generic function will assign the second position to decreasing and it will get discarded. What's the best way to harmonize these two functions?
The full function is:
# Sort a data frame
sort.data.frame <- function(form,dat){
# Author: Kevin Wright
# http://tolstoy.newcastle.edu.au/R/help/04/09/4300.html
# Some ideas from Andy Liaw
# http://tolstoy.newcastle.edu.au/R/help/04/07/1076.html
# Use + for ascending, - for decending.
# Sorting is left to right in the formula
# Useage is either of the following:
# sort.data.frame(~Block-Variety,Oats)
# sort.data.frame(Oats,~-Variety+Block)
# If dat is the formula, then switch form and dat
if(inherits(dat,"formula")){
f=dat
dat=form
form=f
}
if(form[[1]] != "~") {
stop("Formula must be one-sided.")
}
# Make the formula into character and remove spaces
formc <- as.character(form[2])
formc <- gsub(" ","",formc)
# If the first character is not + or -, add +
if(!is.element(substring(formc,1,1),c("+","-"))) {
formc <- paste("+",formc,sep="")
}
# Extract the variables from the formula
vars <- unlist(strsplit(formc, "[\\+\\-]"))
vars <- vars[vars!=""] # Remove spurious "" terms
# Build a list of arguments to pass to "order" function
calllist <- list()
pos=1 # Position of + or -
for(i in 1:length(vars)){
varsign <- substring(formc,pos,pos)
pos <- pos+1+nchar(vars[i])
if(is.factor(dat[,vars[i]])){
if(varsign=="-")
calllist[[i]] <- -rank(dat[,vars[i]])
else
calllist[[i]] <- rank(dat[,vars[i]])
}
else {
if(varsign=="-")
calllist[[i]] <- -dat[,vars[i]]
else
calllist[[i]] <- dat[,vars[i]]
}
}
dat[do.call("order",calllist),]
}
Example:
library(datasets)
sort.data.frame(~len+dose,ToothGrowth)
Use the arrange function in plyr. It allows you to individually pick which variables should be in ascending and descending order:
arrange(ToothGrowth, len, dose)
arrange(ToothGrowth, desc(len), dose)
arrange(ToothGrowth, len, desc(dose))
arrange(ToothGrowth, desc(len), desc(dose))
It also has an elegant implementation:
arrange <- function (df, ...) {
ord <- eval(substitute(order(...)), df, parent.frame())
unrowname(df[ord, ])
}
And desc is just an ordinary function:
desc <- function (x) -xtfrm(x)
Reading the help for xtfrm is highly recommended if you're writing this sort of function.
There are a few problems there. sort.data.frame needs to have the same arguments as the generic, so at a minimum it needs to be
sort.data.frame(x, decreasing = FALSE, ...) {
....
}
To have dispatch work, the first argument needs to be the object dispatched on. So I would start with:
sort.data.frame(x, decreasing = FALSE, formula = ~ ., ...) {
....
}
where x is your dat, formula is your form, and we provide a default for formula to include everything. (I haven't studied your code in detail to see exactly what form represents.)
Of course, you don't need to specify decreasing in the call, so:
sort(ToothGrowth, formula = ~ len + dose)
would be how to call the function using the above specifications.
Otherwise, if you don't want sort.data.frame to be an S3 generic, call it something else and then you are free to have whatever arguments you want.
I agree with #Gavin that x must come first. I'd put the decreasing parameter after the formula though - since it probably isn't used that much, and hardly ever as a positional argument.
The formula argument would be used much more and therefore should be the second argument. I also strongly agree with #Gavin that it should be called formula, and not form.
sort.data.frame(x, formula = ~ ., decreasing = FALSE, ...) {
...
}
You might want to extend the decreasing argument to allow a logical vector where each TRUE/FALSE value corresponds to one column in the formula:
d <- data.frame(A=1:10, B=10:1)
sort(d, ~ A+B, decreasing=c(A=TRUE, B=FALSE)) # sort by decreasing A, increasing B