stata multiple imputation impute chained - imputation

i have been trying to do a multiple imputation with stata on a big dataset (600k lines), but am getting some errors which i can't explain.
Also tried different approaches, but i always have some problems.. Hope you can help me, i am kinda new to multiple imputation.
And sorry for the german variables, but i guess you won't have big problems with it.
set more off
set level 99
mi set mlong
mi misstable patterns leistungsfähig sa_n Bewilligungsdiagnosegruppen1 Berufstellung Arbeitsunf Erwerbst Rehadauer1 sb_n Berufsgrkl famstand1 Rehaart ORT bb_n, frequency
mi register imputed leistungsfähig sa_n Bewilligungsdiagnosegruppen1 Berufstellung Arbeitsunf Erwerbst Rehadauer1 sb_n Berufsgrkl famstand1 Rehaart ORT bb_n
mi impute chained (logit, augment) leistungsfähig (regress, bootstrap) Rehadauer1(ologit) sb_n bb_n (mlogit, augment) sa_n Bewilligungsdiagnosegruppen1 Berufstellung Arbeitsunf Erwerbst Berufsgrkl famstand1 Rehaart ORT= sexn Alter_n AmR AHB, add(5) dots savetrace(trace1,replace)
errors:
imputing m=1 through m=5 matsize too small
You have attempted to create a matrix with too many rows or columns or attempted to fit a model with too many variables. You need to increase matsize; it is currently 400. Use set matsize;
see help matsize.
increased mat-size (450, and higher), still same error
If you are using factor variables and included an interaction that has lots of missing cells, either increase matsize or set emptycells drop to reduce the required matrix size; see help set emptycells.
used set emptycells drop, still same error
If you are using factor variables, you might have accidentally treated a continuous variable as a categorical, resulting in lots of categories. Use the c. operator on such variables.
error occurred during imputation of leistungsfähig Rehadauer1 sb_n bb_n sa_n
Bewilligungsdiagnosegruppen1 Berufstellung Arbeitsunf Erwerbst Berufsgrkl famstand1 Rehaart ORT on m
= 1
the problem is, only Rehadauer1 is no categorical variable.. so do i have to write c.Rehadauer1, or what does it mean?
another approach with mi and ice ():
mi register imputed leistungsfähig sa_n Bewilligungsdiagnosegruppen1 Berufstellung Arbeitsunf Erwerbst Rehadauer1 sb_n Berufsgrkl famstand1 Rehaart ORT bb_n
mi export ice, clear
ice leistungsfähig m.sa_n m.Bewilligungsdiagnosegruppen1 m.Berufstellung m.Arbeitsunf m.Erwerbst i.sexn o.sb_n o.Berufsgrkl m.famstand1 m.Rehaart i.AHB m.ORT o.bb_n Alter_n Rehadauer1 AmR, m(5) cmd(AmR Rehadauer1 Alter_n: regress) nopp saving(icedata, replace)
the problem here is, that it will always throw the error "perfect prediction detected" if used without nopp..

Related

Wrapping TimerOutputs macros

I've got a situation where it would be handy to have a variable, to, which can either be a TimerOutput or nothing. I'm interested in providing a macro that takes the same arguments as #timeit from TimerOutputs (e.g. #timeit to "time spent" s = foo()). Because to is potentially set to nothing, I can't simply disable a TimerObject.
If to is set, my preference would be to pass the arguments along to #timeit, but could live with calling timer_expr(__module__, false, args...).
If to is not set, I'd want to just return the remaining arguments (something like args[3:end], maybe) as an expression.
I've been fussing with this for a day or so, and can handle each of the cases in isolation. For the case where I'm not involving TimerOutput, this seems to work:
macro no_timer(args...)
args[3:end][1]
end
And for the case where I am using TimerOutput, I can do this:
macro with_timer(args...)
timer_expr(__module__, false, args...)
end
Not surprising, as that's just what #timeit does.
I haven't figured out how to handle both cases in one macro. I've gotten the closest by wrapping everything in a ternary operator - i.e. return :(isnothing($(args[1])) ? <expression stuff> : <TimerOutput stuff>), but there is some level of abstraction mismatch that I haven't unsnarled.
Addendum: I've come to the conclusion that my original framing was an "X-Y" problem. I didn't actually need a new macro to solve my problem - hence my accepting the answer I did. That said, I am struck by the fact that both of the answers proffered stayed well away from defining a macro.
You have already this functionality in TimerOuputs.
Just use disable_timer! and enable_timer! methods.
julia> const to = TimerOutput();
julia> disable_timer!(to);
julia> #timeit to "sleep" sleep(0.02)
julia> to
────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 23.6s / 0.0% 464KiB / 0.0%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────
julia> enable_timer!(to);
julia> #timeit to "sleep" sleep(0.02)
julia> to
────────────────────────────────────────────────────────────────────
Time Allocations
─────────────────────── ────────────────────────
Tot / % measured: 37.3s / 0.1% 777KiB / 0.0%
Section ncalls time %tot avg alloc %tot avg
────────────────────────────────────────────────────────────────────
sleep 1 22.6ms 100.0% 22.6ms 320B 100.0% 320B
────────────────────────────────────────────────────────────────────
You can define your own more specific version of the timer_expr function that the macro calls, where the returned expression contains a check for to argument being nothing, then doing the appropriate thing.
import TimerOutputs: timer_expr
function timer_expr(m::Module, is_debug::Bool, to::Symbol, label::String, ex::Expr)
unescaped(ex) = ex.head == :escape ? ex.args[1] : ex
# this is from the original timer_expr functions,
# to be used when `to` isn't nothing
timer_ex = TimerOutputs.is_func_def(ex) ?
unescaped(TimerOutputs.timer_expr_func(m, is_debug, to, ex, label)) :
TimerOutputs._timer_expr(m, is_debug, to, label, ex)
cond_ex = esc(:(if isnothing($to)
$ex
else
placeholder # dummy symbol, to be replaced
end))
# args[3] = "else" section,
# the placeholder is args[2] within that
unescaped(cond_ex).args[3].args[2] = timer_ex
cond_ex
end

Calculating the e number using Raku

I'm trying to calculate the e constant (AKA Euler's Number) by calculating the formula
In order to calculate the factorial and division in one shot, I wrote this:
my #e = 1, { state $a=1; 1 / ($_ * $a++) } ... *;
say reduce * + * , #e[^10];
But it didn't work out. How to do it correctly?
I analyze your code in the section Analyzing your code. Before that I present a couple fun sections of bonus material.
One liner One letter1
say e; # 2.718281828459045
"A treatise on multiple ways"2
Click the above link to see Damian Conway's extraordinary article on computing e in Raku.
The article is a lot of fun (after all, it's Damian). It's a very understandable discussion of computing e. And it's a homage to Raku's bicarbonate reincarnation of the TIMTOWTDI philosophy espoused by Larry Wall.3
As an appetizer, here's a quote from about halfway through the article:
Given that these efficient methods all work the same way—by summing (an initial subset of) an infinite series of terms—maybe it would be better if we had a function to do that for us. And it would certainly be better if the function could work out by itself exactly how much of that initial subset of the series it actually needs to include in order to produce an accurate answer...rather than requiring us to manually comb through the results of multiple trials to discover that.
And, as so often in Raku, it’s surprisingly easy to build just what we need:
sub Σ (Unary $block --> Numeric) {
(0..∞).map($block).produce(&[+]).&converge
}
Analyzing your code
Here's the first line, generating the series:
my #e = 1, { state $a=1; 1 / ($_ * $a++) } ... *;
The closure ({ code goes here }) computes a term. A closure has a signature, either implicit or explicit, that determines how many arguments it will accept. In this case there's no explicit signature. The use of $_ (the "topic" variable) results in an implicit signature that requires one argument that's bound to $_.
The sequence operator (...) repeatedly calls the closure on its left, passing the previous term as the closure's argument, to lazily build a series of terms until the endpoint on its right, which in this case is *, shorthand for Inf aka infinity.
The topic in the first call to the closure is 1. So the closure computes and returns 1 / (1 * 1) yielding the first two terms in the series as 1, 1/1.
The topic in the second call is the value of the previous one, 1/1, i.e. 1 again. So the closure computes and returns 1 / (1 * 2), extending the series to 1, 1/1, 1/2. It all looks good.
The next closure computes 1 / (1/2 * 3) which is 0.666667. That term should be 1 / (1 * 2 * 3). Oops.
Making your code match the formula
Your code is supposed to match the formula:
In this formula, each term is computed based on its position in the series. The kth term in the series (where k=0 for the first 1) is just factorial k's reciprocal.
(So it's got nothing to do with the value of the prior term. Thus $_, which receives the value of the prior term, shouldn't be used in the closure.)
Let's create a factorial postfix operator:
sub postfix:<!> (\k) { [×] 1 .. k }
(× is an infix multiplication operator, a nicer looking Unicode alias of the usual ASCII infix *.)
That's shorthand for:
sub postfix:<!> (\k) { 1 × 2 × 3 × .... × k }
(I've used pseudo metasyntactic notation inside the braces to denote the idea of adding or subtracting as many terms as required.
More generally, putting an infix operator op in square brackets at the start of an expression forms a composite prefix operator that is the equivalent of reduce with => &[op],. See Reduction metaoperator for more info.
Now we can rewrite the closure to use the new factorial postfix operator:
my #e = 1, { state $a=1; 1 / $a++! } ... *;
Bingo. This produces the right series.
... until it doesn't, for a different reason. The next problem is numeric accuracy. But let's deal with that in the next section.
A one liner derived from your code
Maybe compress the three lines down to one:
say [+] .[^10] given 1, { 1 / [×] 1 .. ++$ } ... Inf
.[^10] applies to the topic, which is set by the given. (^10 is shorthand for 0..9, so the above code computes the sum of the first ten terms in the series.)
I've eliminated the $a from the closure computing the next term. A lone $ is the same as (state $), an anonynous state scalar. I made it a pre-increment instead of post-increment to achieve the same effect as you did by initializing $a to 1.
We're now left with the final (big!) problem, pointed out by you in a comment below.
Provided neither of its operands is a Num (a float, and thus approximate), the / operator normally returns a 100% accurate Rat (a limited precision rational). But if the denominator of the result exceeds 64 bits then that result is converted to a Num -- which trades performance for accuracy, a tradeoff we don't want to make. We need to take that into account.
To specify unlimited precision as well as 100% accuracy, simply coerce the operation to use FatRats. To do this correctly, just make (at least) one of the operands be a FatRat (and none others be a Num):
say [+] .[^500] given 1, { 1.FatRat / [×] 1 .. ++$ } ... Inf
I've verified this to 500 decimal digits. I expect it to remain accurate until the program crashes due to exceeding some limit of the Raku language or Rakudo compiler. (See my answer to Cannot unbox 65536 bit wide bigint into native integer for some discussion of that.)
Footnotes
1 Raku has a few important mathematical constants built in, including e, i, and pi (and its alias π). Thus one can write Euler's Identity in Raku somewhat like it looks in math books. With credit to RosettaCode's Raku entry for Euler's Identity:
# There's an invisible character between <> and i⁢π character pairs!
sub infix:<⁢> (\left, \right) is tighter(&infix:<**>) { left * right };
# Raku doesn't have built in symbolic math so use approximate equal
say e**i⁢π + 1 ≅ 0; # True
2 Damian's article is a must read. But it's just one of several admirable treatments that are among the 100+ matches for a google for 'raku "euler's number"'.
3 See TIMTOWTDI vs TSBO-APOO-OWTDI for one of the more balanced views of TIMTOWTDI written by a fan of python. But there are downsides to taking TIMTOWTDI too far. To reflect this latter "danger", the Perl community coined the humorously long, unreadable, and understated TIMTOWTDIBSCINABTE -- There Is More Than One Way To Do It But Sometimes Consistency Is Not A Bad Thing Either, pronounced "Tim Toady Bicarbonate". Strangely enough, Larry applied bicarbonate to Raku's design and Damian applies it to computing e in Raku.
There is fractions in $_. Thus you need 1 / (1/$_ * $a++) or rather $_ /$a++.
By Raku you could do this calculation step by step
1.FatRat,1,2,3 ... * #1 1 2 3 4 5 6 7 8 9 ...
andthen .produce: &[*] #1 1 2 6 24 120 720 5040 40320 362880
andthen .map: 1/* #1 1 1/2 1/6 1/24 1/120 1/720 1/5040 1/40320 1/362880 ...
andthen .produce: &[+] #1 2 2.5 2.666667 2.708333 2.716667 2.718056 2.718254 2.718279 2.718282 ...
andthen .[50].say #2.71828182845904523536028747135266249775724709369995957496696762772

DBI binding parameters and square brackets

I'm having problems with the following code:
$sql = <<"END_SQL";
SELECT DISTINCT Matching.[CI M], Matching2_1.[LAC M], Matching2_1.[CI M], Matching.[Band M], Matching2_1.[Band M], Matching.Site, Matching2_1.Site, Matching.[BSC/RNC], Matching.[CellName M], Matching.[BSC/RNC M], Matching2_1.[CellName M]
FROM Matching, [N 900 - 900], Matching AS Matching2_1
WHERE Matching.[Band M]= ? AND Matching2_1.[Band M]= ? ;
END_SQL
$sth = $dbh->prepare($sql);
$sth->execute(900, 900);
The columns name contains space, the database is MS access so I use square brackets to use them in a query
The problem is that Perl interprets the square brackets as a binding parameters and expects 11 parameters.
Here is the error:
DBD::ODBC::st execute failed: [Microsoft][ODBC Microsoft Access Driver] Too few parameters. Expected 11. (SQL-07002) at NEIGHBORS MAPPING.pl line 89.
The Access Database Engine can also recognize backquotes as table/field delimiters, so if the quare brackets are giving you trouble then try
SELECT DISTINCT Matching.`CI M`, ...

Is this the simplified version of this boolean expression? Or is this reviewer wrong

Cause I've tried doing the truth table unfortunately one has 3 literals and the other has 4 so i got confused.
F = (A+B+C)(A+B+D')+B'C;
and this is the simplified version
F = A + B + C
http://www.belley.org/etc141/Boolean%20Sinplification%20Exercises/Boolean%20Simplification%20Exercise%20Questions.pdf
cause I think there's something wrong with this reviewer.. or is it accurate?
btw is simplification different from minimizing from Sum of Minterms to Sum of Products?
Yes, it is the same.
Draw the truth table for both expressions, assuming that there are four input variables in both. The value of D will not play into the second truth table: values in cells with D=1 will match values in cells with D=0. In other words, you can think of the second expression as
F = A +B + C + (0)(D)
You will see that both tables match: the (A+B+C)(A+B+D') subexpression has zeros in ABCD= {0000, 0001, 0011}; (A+B+C) has zeros only at {0000, 0001}. Adding B'C patches zero at 0011 in the first subexpressions, so the results are equivalent.

Perl syntax errors on declaring a large constant array

I want to declare a huge list of constant array like:
my #tlds = (ac ad ae aero af ag ai al am an ao aq ar arpa as asia at au aw ax az ba bb bd be bf bg bh bi biz bj bm bn bo br bs bt bv bw by bz ca cat cc cd cf cg ch ci ck cl cm cns
co com coop cr cu cv cw cx cy cz de dj dk dm do dz ec edu ee eg er es et eu fi fj fk fm fo fr ga gb gd ge gf gg gh gi gl gm gn gov gp gq gr gs gt gu gw gy hk hm hn hr ht hu id ie
il im in info int io iq ir is it je jm jo jobs jp ke kg kh ki km kn kp kr kw ky kz la lb lc li lk lr ls lt lu lv ly ma mc md me mg mh mil mk ml mm mn mo mobi mp mq mr ms mt mu museum mv mw mx my mz na name nc ne net nf ng ni nl no np nr nu nz om org pa pe pf pg ph pk pl pm pn pr pro ps pt pw py qa re ro rs ru rw sa sb sc sd se sg sh si sj sk sl sm sn so sr st su sv sx sy sz tc td tel tf tg th tj tk tl tm tn to tp tr travel tt tv tw tz ua ug uk us uy uz va vc ve vg vi vn vu wf ws xn xxx ye yt za zm zw);
But It throws errors:
1. Syntax error, near dm do dz
2. no such class mz, near "mw mx my mz"
ANy pointers on how to remove these errors?
If I use qw before that list it shows no errors, why? Whats wrong with the above declaration?
qw does quoting and separating for you.
my #foo = ( "bar", "baz" );
means the same as:
my #foo = qw( bar baz );
Having a stack of sequential unquoted values is just an error.
See the documentation for quote like operators.
The qw operator is a quoting operator, as are all of the other q* keywords (q qq qw qr qx) each takes a delimiting character (or pair in the case of braces) and treats everything within the delimiters as a string. Each of the operators do something different to the string, with qw splitting the string on whitespace to create a list.
When you write a series of barewords in Perl, you end up with a large nested chain of indirect object calls. Here is a short example without keywords (so that it is not a syntax error):
$ perl -MO=Deparse -e 'ac ad ae aero af ag ai al am an ao aq ar'
'ad'->ac('aero'->ae('ag'->af('al'->ai('an'->am('aq'->ao('ar'))))));
-e syntax OK
In your case, perl merrily went along parsing what looks like indirect object syntax until it encountered a keyword, which disrupted the chain and caused a syntax error.
If you had not used a keyword in your list, the code would have compiled fine, and then you would have gotten a runtime error about a missing method in a package. If you were running your code under the use strict; pragma (which you always should) then the final bareword would become a syntax error (since strict subs prevents promoting barewords to strings. That would have at least caught the error at compile time.
The important takeaway from this is that Perl has many quote-like operators that are effectively strings with special processing attached. Removing the quote-like operator will inevitably result in syntax errors, since arbitrarily formatted strings are not valid Perl. A list of the buitin quote-like operators can be found on the perlop manpage.
This is because when you omit qw/.../, your characters are treated as barewords, and when it comes to "do", which is a keyword, error is signaled.
EDIT: Even though my answer explains the reason of an error (well, I beleive so), #Quentin's suggestion is more constructive: do not use barewords (i.e. do use qw// in your example) to save your time catching errors like yours. For example, you have int in your list, which is also a keyword, lc (a function), etc.