I have 16 variables (each variable can take value between 1 and 56) how can I encode the value of these 16 variables into one variable ? For example {1,15,22,32,21,2,3,5,6,4,5,6,7,8,8,2} needs to be encoded and output can be something to decode the exact value of each variable
Related
I have a series of formatted numeric variables and I would like to convert them all into character variables assigned the corresponding values found in the format labels. Here is an example of the format:
proc format;
value Group
1= 'Experimental 1'
2= 'Experimental 2'
3= 'Treatment as usual';
run;
My variable Group_num has values 1-3 and has this format applied. I want to create a new character variable called Group_char which has the values "Experimental 1", "Experimental 2", and "Treatment as usual".
The (long) way I would do this would be:
data out;
set in;
format Group_char $30.;
if Group_num=1 then Group_char="Experimental 1";
if Group_num=2 then Group_char="Experimental 2";
if Group_num=3 then Group_char="Treatment as usual";
run;
However, I need to do this to 13 different variables and I don't know what their variable values, format names, and format labels are without looking at the data more. Preferably, I would want to use whatever format is already applied to the variable to automatically translate it into a new character variable, without needing to know the format name/labels or original variable values. However, if I need to find out the format name to create a new character variable just by using the format name, that would be better than needing to also know the original variables values and format labels as well.
Alternatively, another way to solve my problem would be if you could tell me if there is a way of importing SPSS datasets using variable value labels only, and leaving the values themselves out of the picture entirely, such that numeric variables with value labels are imported as character variables.
Thank you
First off, it's usually best not to do this - most of the time that you need the character bits, you can get them off of the formats.
But, that said... you need to look at the vvalue function.
data want;
set have;
var_char = vvalue(var_num);
run;
vvalue returns the formatted value of the argument.
I have a BER structure like this...
$ openssl asn1parse -inform der -in test.der -i -dump
????:d=4 hl=2 l=inf cons: cont [ 0 ]
????:d=5 hl=3 l= 240 prim: OCTET STRING
0000 - AABBCCDD
????:d=5 hl=2 l= 8 prim: OCTET STRING
0000 - EEFF
????:d=5 hl=2 l= 0 prim: EOC
...or in der2ascii style...
[0] `80`
OCTET_STRING { `AABBCCDD` }
OCTET_STRING { `EEFF` }
`0000`
What I know: indefinite-length encoding must contain a constructed type, because primitive types may introduce ambiguities, e.g. when containing 0x0000. What I want to know: How does a decoder must behave when parsing this BER structure? Are the header bytes of both OCTET STRINGs included in the encoding? If yes, how is indefinite-length byte data encoded? How does an application interpret the value of the TLV field tagged [0], when the second OCTET STRING is e.g. an INTEGER?
I am asking this question, because in the CMS standard, a field is defined as single OCTET STRING, but in most BER encodings I always see two of them. Is this only due to the indefinite-length encoding? Am I missing something?
From ITU-T X.690:
8.1.4 Contents octets
The contents octets shall consist of zero, one or more octets, and shall encode the data value as specified in
subsequent clauses.
NOTE – The contents octets depend on the type of the data value;
subsequent clauses follow the same sequence as the definition of types
in ASN.1.
Does this mean, that I can put every constructed type and the application must only interpret the value part of the contructed TLV structure?
When you encode a primitive OCTET STRING in indefinite length mode, the encoder must:
split up the value into chunks of smaller OCTET STRINGs
encode each chunk in definite length mode so that each has its own TLV (with length!)
the whole sequence of definite length encoded primitive OCTET STRINGs must be framed by a single, indefinite length encoded constructed OCTET STRING "container" having its own TLV (without length, but with end-of-octets sentinel)
At the other end, the decoder extracts the V part from the inner, definite length OCTET STRING chunks (dropping their TL headers). Then joins/consumes V's together in the order of arrival dropping the TL part of the outer frame.
Note that the idea behind indefinite length encoding technique is that both encoder and decoder can emit/consume incomplete, possibly oversized, data.
Chunk size is chosen by the encoder/application based on data availability, memory situation and possibly the estimation of decoder's buffering capabilities. I think this is mentioned somewhere in the X.280/X.680 papers.
Encoder is not allowed to put chunks of different ASN.1 types into any single indefinite length encoded container. In other words, all chunks must be of the same type as the outer container.
That should hopefully explain why you may see multiple (depending on chunk size) OCTET STRINGs in the indefinite length encoded BER/CER stream where just a single OCTET STRING is expected.
DER forbids indefinite length encoding on the grounds that serialized representation of the same data may change on re-encoding (due to potentially changing chunk size).
I have data written into short data type. The data written is of 2's complement form.
Now when I try to print the data using %04x, the data with MSB=0 is printed fine for eg if data=740, print I get is 0740
But when the MSB=1, I am unable to get a proper print. For eg if data=842, print I get is fffff842
I want the data truncated to 4 bytes so expected output is f842
Either declare your data as a type which is 16 bits long, or make sure the printing function uses the right format for 16 bits value. Or use your current type, but do a bitwise AND with 0xffff. What you can do depends on the language you're doing it in really.
But whichever way you go, check your assumptions again. There seems to be a few issues in your question:
2s-complement applies to signed numbers only. There are no negative numbers in your question.
Assuming you mean C's short - it doesn't have to be 16 bits long.
"I get is fffff842 I want the data truncated to 4 bytes" - fffff842 is 4 bytes long. f842 is 2 bytes long.
2-bytes long value 842 does not have the MSB set.
I'm assuming C (or possibly C++) as the language here.
Because of the default argument promotions involved when calling a variable argument function (such as printf), your use of a short will result in an integer promotion, which states that "If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int".
A short is converted to an int by means of sign-extension, and 0xf842 sign-extended to 32 bits is 0xfffff842.
You can use a bitwise AND to mask off the most significant word:
printf("%04x", data & 0xffff);
You could also add the h length specifier to state that you only want to print an (unsigned) short worth of bits from an int:
printf("%04hx", data);
I use eval-expression (M-:) to get some variable's value in the message buffer.
I used it today to evaluate the variable left-margin and got the following value:
0 (#o0, #x0)
0 is the actual value, but I'm oblivious to what the other symbols mean.
If I evaluate the following with eval-last-sexp (C-x C-e) I just get the value alone:
(identity left-margin)
-> 0
Can someone shed some light on what those symbols mean and why they appear only with eval-expression? Thanks.
It is the octal and hexadecimal representation of 0. The prefix #o means "octal representation follows" and #x means "hexadecimal representation follows".
To verify, do set-variable to e.g 10 first and then you'll get:
10 (#o12, #xa)
a is 10 in hex, and 12 is 10 in octal form.
This is a part of my data.
ªU€ÿ ÿ dô # #›ÿÿ;< …æ ³ 3m ...
It is saved in a file. When I look at it with a hex-editor I can see the hex-values. How can I read this "hex-data" with matlab?
EDIT: I get this error:
??? Error using ==> hex2dec at 38
Input string found with characters other than 0-9, a-f, or A-F.
with this code:
a = fread(fid,1,'uint32','l');
fprintf('%X',a)
b = hex2dec(a);
hex2dec() expects a hexadecimal number string as input.
>> hex2dec('28')
With your fread statement I suspect that your 'a' variable will be an integer*4 hence the error message, my understanding being that the precision has already converted the hex string to the type you've declared. If you want to pass this value through hex2dec then you'll need to create a string input.
>> hex2dec(num2str(28));
Do you know the format of your binary file? i.e. is the first value of the data an integer*4?
EDIT: added hex output
In response to the comment, as you read the data in, MATLAB is converting the binary data stream into the format you defined. If you want to get the stream of hexadecimal data then the simpliest way is to convert them back into hexadecimal.
a=dec2hex(fread(fid))
'a' will be a list of all the values in hexadecimal format and should match what you see in your hex editor.
q=dec2bin(hex2dec(num2str(p)))