Stata: append two datasets, retain value labels - append

I'm using Stata14 and I'm trying to append two survey datasets that have ~200 variables with same names but different values and value labels. I would like to do the appending so that value labels are retained from the dataset 'on disk'.
Here is an example describing my problem:
Variable in dataset 1 (master):
value - label
1 - yes
2 - no
Same variable in dataset 2 (appended to master):
value - label
1 - yes absolutely
2 - no definitely not
3 - maybe
4 - don't know
Result with append using "dataset 2.dta"
value - label
1 - yes
2 - no
3 - 3
4 - 4
Desired result:
value - label
1 - yes
2 - no
3 - maybe
4 - don't know
Is there any way to do this directly using append? If not, any suggestions on doing the task efficiently are most welcome.

You want to make value labels consistent, which is sensible, fine and easy to do.
When you have appended all the datasets, you then overwrite any value label assignment with a quick
label define whatever 1 yes 2 no 3 maybe 4 "don't know"
label val myvar whatever
with a , modify on the first if a set of value labels with that name already exists.
It's a task to do late. It doesn't need to be fixed before or during the append, and it can most easily be done at that point.
Naturally, this is tedious for several variables, but it's not difficult to understand. Furthermore, even if append were able to take instructions on which labels to use, you would still have to spell that out. In your example, the value labels you want are not actually in use in any of the datasets. So, there will be some inevitable pain. There is a mess to sort out and what the fix is can't be fully automated because it depends on your ideas of what labels are best.

in short, the answer is
NOPE
so you have to be smart. Try to use this trick http://www.stata.com/support/faqs/data-management/keeping-same-variable-with-collapse/ where you get a local copy of the labels that you will be attaching to the full dataset afterwards.

Related

How to reach the end of a chain of cells

I have a file with two sheets:
sheet_A
A B
1 Mr. Joe USD
sheet_B
A B
1 =sheet_A.A1 ???
sheet_B.B1 shall show the value USD. I know i could get it easily with =sheet_A.B1 but I do not want that.
If I enter into sheet_B.B1 =ADDRESS(ROW();COLUMN()-1) I get the output $C$1 and with =INDIRECT(ADDRESS(ROW();COLUMN()-1)) Mr. Joe.
How can I "navigate" through a chain sheet_B.B1 - sheet_B.A1 - sheet_A.A1 - sheet_A.B1?
Edit 1
Maybe I need something like this
=OFFSET(FORMULA(ADDRESS(ROW();COLUMN()-1);0;1)#
sheet_B.B2 shall show the content of sheet_A.B2 in relation of the value in sheet_B.A1
Here are two possibilities. Either formula produces USD in sheet_B.B1.
=INDIRECT(ADDRESS(ROW();COLUMN();;;"sheet_A"))
=VLOOKUP(A1;$sheet_A.A1:B1;2)
Documentation: ADDRESS, VLOOKUP.
EDIT:
One more idea: The following produces the string "=sheet_A.A1", which could be parsed for the sheet name and cell address. Perhaps you would like to use it to refer to sheet_A.B1.
=FORMULA(INDIRECT(ADDRESS(ROW();COLUMN()-1)))
However, as I commented, there is probably an easier way for what you are trying to accomplish.
Documentation: FORMULA.
EDIT 2:
Here is the formula you requested. It uses REPLACE to remove = at the beginning of the string.
=OFFSET(INDIRECT(REPLACE(FORMULA(INDIRECT(ADDRESS(ROW();COLUMN()-1)));1;1;""));0;1)

Create ordinal array with multiple groups

I need to categorize a dataset according to different age groups. The categorization depends on whether the Sex is Male or Female. I first subset the data by gender and then use the ordinal function (dataset is from a Matlab example). The following code crashes on the last line when I try to vertically concatenate the subsets:
load hospital;
subset_m=hospital(hospital.Sex=='Male',:);
subset_f=hospital(hospital.Sex=='Female',:);
edges_f=[0 20 max(subset_f.Age)];
edges_m=[0 30 max(subset_m.Age)];
labels_m = {'0-19','20+'};
labels_f = {'0-29','30+'};
subset_m.AgeGroup= ordinal(subset_m.Age,labels_m,[],edges_m);
subset_f.AgeGroup = ordinal(subset_f.Age,labels_f,[],edges_f);
vertcat(subset_m,subset_f);
Error using dataset/vertcat (line 76)
Could not concatenate the dataset variable 'AgeGroup' using VERTCAT.
Caused by:
Error using ordinal/vertcat (line 36)
Ordinal levels and their ordering must be identical.
Edit
It seems that a vital part was missing in the question, here is the answer to the corrected question. You need to use join rather than vertcat, for example:
joinFull = join(subset_f,subset_m,'LeftKeys','LastName','RightKeys','LastName','type','rightouter','mergekeys',true)
Solution of original problem
It seems like you are actually trying to work with the wrong variable. If I change all instances of hospitalCopy into hospital then everything works fine for me.
Perhaps you copied hospital and edited it, thus losing the validity of the input.
If you really need to have hospitalCopy make sure to assign to it directly after load hospital.
If this does not help, try using clear all before running the code and make sure there is no file called 'hospital' in your current directory.

How to add operator signs '+,-,/,*,mod' etc to a label for making a calculator?

I have made a calculator for simple operations but I cant figure out how should I add the operator signs next to the numerals that I am entering.
I created 2 functions 1 on the number being entered
-(IBAction)buttonDigitPressed:(id)sender
and another for the operation
-(IBAction)buttonOperationPressed:(id)sender.
calculatorScreen.text = [NSString stringWithFormat:#"%.2f",result];
This is for the result to be shown on the label calculatorScreen.
The result i would like would be something like "1+2*3/4" on the calculatorScreen.
Sorry if I misunderstand your question, but what you want is to display on your calculator app the full equation that you've input thus far (e.g. 63+42-62).
Like any other calculator, you should have 2 label, one for your current input, and one to show all that you've entered.(I'm guessing you need the latter)
With the second label up, you can add in the append function into your digitpressed, enter/= function, operation function. If you want to tweak it such 16+23-32 will show up as
1) 16+23
2) 39-32
3) 39-32=7
then you'll have to add in your own specific code. otherwise the label will input as 16+23-32 = 7
You can just append the character to whatever is already on calculatorScreen. Or you can save the current input in an instance variable and display where appropriate.
This is just a guideline, since I don't know the behavior of your calculator in case of this input: 1 + 2 * 3 (simple calculator will return 9, scientific will return 7).

CakePhp: how to set a select using $this->Form->input with values from 1 to 100?

I am using Cakephp and I would like to learn how to set a select with values from 1 to 100?
Please notice I prefer to use $this->Form->input if possible.
TLDR:
echo $this->Form->input('whatever', array(
'type'=>'select',
'options'=>array_combine(range(1,100), range(1,100))
));
Explanation:
PHP's range creates an array of numbers (or letters), which is what you want for your options. But if you use range by itself, it creates:
array(1,2,3,4...
This would give you a dropdown of numbers, but the values will start with zero regardless of the displayed number - in this case, you'd end up with array(0=>1, 1=>2 ...
When you really want this:
array(1=>1, 2=>2, 3=>3 ...
By using array_combine just makes it so the first option has the same value as the displayed number.
(obviously you can write this in 1 line - I just broke it up for ease of reading)

COBOL add 0 to a Variable in COMPUTE

I ran into a strange statement when working on a COBOL program from $WORK.
We have a paragraph that is opening a cursor (from DB2), and the looping over it until it hits an EOT (in pseudo code):
... working storage ...
01 I PIC S9(9) COMP VALUE ZEROS.
01 WS-SUB PIC S9(4) COMP VALUE 0.
... code area ...
PARA-ONE.
PERFORM OPEN-CURSOR
PERFORM FETCH-CURSOR
PERFORM VARYING I FROM 1 BY 1 UNTIL SQLCODE = DB2EOT
do stuff here...
END-PERFORM
COMPUTE WS-SUB = I + 0
PERFORM CLOSE-CURSOR
... do another loop using WS-SUB ...
I'm wondering why that COMPUTE WS-SUB = I + 0 line is there. My understanding is that I will always at least be 1, because of the perform block above it (i.e., even if there is an EOT to start with, I will be set to one on that initial iteration).
Is that COMPUTE line even needed? Is it doing some implicit casting that I'm not aware of? Why would it be there? Why wouldn't you just MOVE I TO WS-SUB?
Call it stupid, but with some compilers (with the correct options in effect), given
01 SIGNED-NUMBER PIC S99 COMP-5 VALUE -1.
01 UNSIGNED-NUMBER PIC 99 COMP-5.
...
MOVE SIGNED-NUMBER TO UNSIGNED-NUMBER
DISPLAY UNSIGNED-NUMBER
results in: 255. But...
COMPUTE UNSIGNED-NUMBER = SIGNED-NUMBER + ZERO
results in: 1 (unsigned)
So to answer your question, this could be classified as a technique used cast signed numbers into unsigned numbers. However, in the code example you gave it makes no sense at all.
Note that the definition of "I" was (likely) coded by one programmer and of WS-SUB by another (naming is different, VALUE clause is different for same purpose).
Programmer 2 looks like "old school": PIC S9(4), signed and taking up all the digits which "fit" in a half-word. The S9(9) is probably "far over the top" as per range of possible values, but such things concern Programmer 1 not at all.
Probably Programmer 2 had concerns about using an S9(9) COMP for something requiring (perhaps many) fewer than 9999 "things". "I'll be 'efficient' without changing the existing code". It seems to me unlikely that the field was ever defined as unsigned.
A COMP/COMP-4 with nine digits does have a performance penalty when used for calculations. Try "ADD 1" to a 9(9) and a 9(8) and a 9(10) and compare the generated code. If you can have nine digits, define with 9(10), otherwise 9(8), if you need a fullword.
Programmer 2 knows something of this.
The COMPUTE with + 0 is probably deliberate. Why did Programmer 2 use the COMPUTE like that (the original question)?
Now it is going to get complicated.
There are two "types" of "binary" fields on the Mainframe: those which will contain values limited by the PICture clause (USAGE BINARY, COMP and COMP-4); those which contain values limited by the field size (USAGE COMP-5).
With BINARY/COMP/COMP-4, the size of the field is determined from the PICture, and so are the values that can be held. PIC 9(4) is a halfword, with a maxiumum value of 9999. PIC S9(4) a halfword with values -9999 through +9999.
With COMP-5 (Native Binary), the PICture just determines the size of the field, all the bits of the field are relevant for the value of the field. PIC 9(1) to 9(4) define halfwords, pic 9(5) to 9(9) define fullwords, and 9(10) to 9(18) define doublewords. PIC 9(1) can hold a maximum of 65535, S9(1) -32,768 through +32,767.
All well and good. Then there is compiler option TRUNC. This has three options. STD, the default, BIN and OPT.
BIN can be considered to have the most far-reaching affect. BIN makes BINARY/COMP/COMP-4 behave like COMP-5. Everything becomes, in effect, COMP-5. PICtures for binary fields are ignored, except in determining the size of the field (and, curiously, with ON SIZE ERROR, which "errors" when the maxima according to the PICture are exceeded). Native Binary, in IBM Enterprise Cobol, generates, in the main, though not exclusively, the "slowest" code. Truncation is to field size (halfword, fullword, doubleword).
STD, the default, is "standard" truncation. This truncates to "PICture". It is therefore a "decimal" truncation.
OPT is for "performance". With OPT, the compiler truncates in whatever way is the most "performant" for a particular "code sequence". This can mean intermediate values and final values may have "bits set" which are "outside of the range" of the PICture. However, when used as a source, a binary field will always only reflect the value specified by the PICture, even if there are "excess" bits set.
It is important when using OPT that all binary fields "conform to PICture" meaning that code must never rely on bits which are set outside the PICture definition.
Note: Even though OPT has been used, the OPTimizer (OPT(STD) or OPT(FULL)) can still provide further optimisations.
This is all well and good.
However, a "pickle" can readily ensue if you "mix" TRUNC options, or if the binary definition in a CALLing program is not the same as in the CALLed program. The "mix" can occur if modules within the same run-unit are compiled with different TRUNC options, or if a binary field on a file is written with one TRUNC option and later read with another.
Now, I suspect Programmer 2 encountered something like this: Either, with TRUNC(OPT) they noticed "excess bits" in a field and thought there was a need to deal with them, or, through the "mix" of options in a run-unit or "across file usage" they noticed "excess bits" where there would be a need to do something about it (which was to "remove the mix").
Programmer 2 developed the COMPUTE A = B + 0 to "deal" with a particular problem (perceived or actual) and then applied it generally to their work.
This is a "guess", or, better, a "rationalisation" which works with the known information.
It is a "fake" fix. There was either no problem (the normal way that TRUNC(OPT) works) or the correct resolution was "normalisation" of the TRUNC option across modules/file use.
I do not want loads of people now rushing off and putting COMPUTE A = B + 0 in their code. For a start, they don't know why they are doing it. For a continuation it is the wrong thing to do.
Of course, do not just remove the "+ 0" from any of these that you find. If there is a "mix" of TRUNCs, a program may stop "working".
There is one situation in which I have used "ADD ZERO" for a BINARY/COMP/COMP-4. This is in a "Mickey Mouse" program, a program with no purpose but to try something out. Here I've used it as a method to "trick" the optimizer, as otherwise the optimizer could see unchanging values so would generate code to use literal results as all values were known at compile time. (A perhaps "neater" and more flexible way to do this which I picked up from PhilinOxford, is to use ACCEPT for the field). This is not the case, for certain, with the code in question.
I wonder if a testing version of the sources ever had
COMPUTE WS-SUB = I + 0
ON SIZE ERROR
DISPLAY "WS-SUB overflow"
STOP RUN
END-COMPUTE
with the range test discarded when the developer was satisfied and cleaning up? MOVE doesn't allow declarative SIZE statements. That's as much of a reason as I could see. Or perhaps developer habit of using COMPUTE to move, as a subtle reminder to question the need for defensive code at every step? And perhaps not knowing, as Joe pointed out, the SIZE clause would be just as effective without the + 0? Or a maintainer struggled with off by one errors and there was a corrective change from 1 to 0 after testing?