SQL Server LEN function reports wrong result - tsql

Let's say we have the following casting of an int number into binary value i.e
cast(120 as binary(8)) or any other int number into binary(8).
What we normally expect from len(cast(120 as binary(8))) = 8 and this is true unless we try with number 32 where select len(cast(32 as binary(8))) returns 7 !
Is this a bug of SQL Server?

Not a bug, it's how LEN works. LEN:
Returns the number of characters of the specified string expression,
excluding trailing spaces.
The definition of "trailing space" seems to differ based the datatype. For binary values, a trailing space is when the binary representation "20". In the BOL entry for LEN there is a note that reads,
Use the LEN to return the number of characters encoded into a given
string expression, and DATALENGTH to return the size in bytes for a
given string expression. These outputs may differ depending on the
data type and type of encoding used in the [value]. For more
information on storage differences between different encoding types,
see Collation and Unicode Support.
With Binary the length (LEN) is reduced by 1 for binary values that end with 20, by 2 for values that end with 2020, etc. Again, it's treating that value like a trailing space. DATALENGTH resolves this. Note this SQL:
DECLARE
#string VARCHAR(100) = '1234567 ',
#binary BINARY(8) = 32;
SELECT [Type] = 'string', [Len] = LEN(#string), [Datalength] = DATALENGTH(#string)
UNION ALL
SELECT [Type] = 'binary(8)', [Len] = LEN(#binary), [Datalength] = DATALENGTH(#binary);
Returns:
Type Len Datalength
--------- ----------- -----------
string 7 8
binary(8) 7 8
Using my rangeAB function (here) I created this query:
SELECT
N = r.RN,
Binaryvalue = CAST(r.RN AS binary(8)),
[Len] = LEN(CAST(r.RN AS binary(8))),
[DataLength] = DATALENGTH(CAST(r.RN AS binary(8)))
FROM dbo.rangeAB(0,10000,1,0) AS r
WHERE LEN(CAST(r.RN AS binary(8))) <> 8
ORDER BY N;
Note these results:
N Binaryvalue Len DataLength
-------------------- ------------------ ----------- -----------
32 0x0000000000000020 7 8
288 0x0000000000000120 7 8
544 0x0000000000000220 7 8
800 0x0000000000000320 7 8
1056 0x0000000000000420 7 8
1312 0x0000000000000520 7 8
1568 0x0000000000000620 7 8
1824 0x0000000000000720 7 8
2080 0x0000000000000820 7 8
2336 0x0000000000000920 7 8
2592 0x0000000000000A20 7 8
2848 0x0000000000000B20 7 8
3104 0x0000000000000C20 7 8
3360 0x0000000000000D20 7 8
3616 0x0000000000000E20 7 8
3872 0x0000000000000F20 7 8
4128 0x0000000000001020 7 8
4384 0x0000000000001120 7 8
4640 0x0000000000001220 7 8
4896 0x0000000000001320 7 8
5152 0x0000000000001420 7 8
5408 0x0000000000001520 7 8
5664 0x0000000000001620 7 8
5920 0x0000000000001720 7 8
6176 0x0000000000001820 7 8
6432 0x0000000000001920 7 8
6688 0x0000000000001A20 7 8
6944 0x0000000000001B20 7 8
7200 0x0000000000001C20 7 8
7456 0x0000000000001D20 7 8
7712 0x0000000000001E20 7 8
7968 0x0000000000001F20 7 8
8224 0x0000000000002020 6 8
8480 0x0000000000002120 7 8
8736 0x0000000000002220 7 8
8992 0x0000000000002320 7 8
9248 0x0000000000002420 7 8
9504 0x0000000000002520 7 8
9760 0x0000000000002620 7 8
Note how the LEN of CAST(8224 AS binary(8) is 6; because 8224 ends with 2020 which is treated like two spaces:
8224 0x0000000000002020 6 8

Related

How to return a cross product of a binary function in KDB

I would like to explore a better way to apply binary function which iterate via each element of the two argument. Let make the question simpler by using below function as an example:
func:{x+y}
a:til 10
q)a
0 1 2 3 4 5 6 7 8 9
b:a
q)b:a
q)b
0 1 2 3 4 5 6 7 8 9
What I what to get is the cross production such that each element of the argument will cross each other and apply the function. My expected result is
0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 11 3 4 5 6 7 8 9 10 11 12 4 5 6 7 8 9 10 11 12 13 5 6 7 8 9 10 11 12 13 14 6 7 8 9 10 11 12 13 14 15 7 8 9 10 11 12 13 14 15 16 8 9 10 11 12 13 14 15 16 17 9 10 11 12 13 14 15 16 17 18
My current solution is crossing the the list of argument first:
(func/) each (a cross b)
I wonder is there a better way to doing that? simply using func'[a;b] will just get a pairwise result which not what I want.
The following should be what you are looking for:
a +/:\: b
The same can apply for other defined functions too, for example:
a {x mod y}/:\: b
You do not need cross for this just each-right or each-left. Because '+' is a vector function you can just iterate over one list and use other list as full vector.
q) a+/:b

Select prior n variables to sum in SAS

I have several data files and totals within those data files that I need to re-calculate.
The variables are broken out by race/ethnicity * sex and then a total is given.
The pattern is repeated for several measures and I cannot re-structure the data files. I have to keep the structure intact.
UPDATED: For example below are the first 32 variables (and 10 rows of data) in one of the files -- Hispanic males, Hispanic females, American Indian males, American Indian females....total males, and total females for grade 8 and then grade 9.
I have over 100 of these totals to do so I want to automate the process. How can I select the 7 prior variables that end in _M or _F to sum (or something to that extent)? TIA!!!
G08_HI_M G08_HI_F G08_AM_M G08_AM_F G08_AS_M G08_AS_F G08_HP_M G08_HP_F G08_BL_M G08_BL_F G08_WH_M G08_WH_F G08_TR_M G08_TR_F TOT_G08_M TOT_G08_F G09_HI_M G09_HI_F G09_AM_M G09_AM_F G09_AS_M G09_AS_F G09_HP_M G09_HP_F G09_BL_M G09_BL_F G09_WH_M G09_WH_F G09_TR_M G09_TR_F TOT_G09_M TOT_G09_F
5 2 9 6 2 3 6 9 7 4 1 4 8 4 . . 7 11 2 13 4 2 14 10 10 13 2 11 9 5 . .
7 1 8 10 2 4 8 0 1 2 8 3 4 5 . . 7 13 12 13 5 15 3 2 2 13 11 15 3 15 . .
7 8 10 9 0 4 7 9 8 0 3 10 7 1 . . 15 9 11 9 11 9 6 7 14 9 12 8 6 14 . .
4 8 9 0 10 6 4 3 10 9 2 5 8 2 . . 13 2 5 13 3 14 5 15 10 15 7 11 9 6 . .
7 6 5 1 4 5 7 4 5 1 8 3 4 4 . . 9 7 7 2 4 8 3 4 3 10 9 8 7 7 . .
3 1 0 2 4 10 2 10 5 9 7 1 8 8 . . 7 9 5 7 13 6 12 13 10 6 2 13 3 12 . .
5 7 4 1 7 9 6 8 3 1 3 2 10 4 . . 14 12 8 5 6 2 2 5 6 4 12 6 4 5 . .
8 9 3 2 3 10 6 5 9 10 8 1 4 5 . . 10 2 3 8 3 15 3 14 9 14 3 12 4 12 . .
4 3 2 6 4 1 2 5 5 6 4 5 4 1 . . 3 14 12 12 15 10 14 11 5 8 9 14 7 15 . .
1 10 4 2 1 3 9 8 3 3 3 0 3 1 . . 12 9 5 7 14 9 13 9 6 14 5 7 13 13 . .
Is this your data? You mention you can change the data structure so you will need to "program the variable names" see below where I have example of create arrays (variable lists) that groups the names for G and sex. as implied by your TOT_: variables.
data G;
input G08_HI_M G08_HI_F G08_AM_M G08_AM_F G08_AS_M G08_AS_F G08_HP_M G08_HP_F
G08_BL_M G08_BL_F G08_WH_M G08_WH_F G08_TR_M G08_TR_F TOT_G08_M TOT_G08_F
G09_HI_M G09_HI_F G09_AM_M G09_AM_F G09_AS_M G09_AS_F G09_HP_M G09_HP_F
G09_BL_M G09_BL_F G09_WH_M G09_WH_F G09_TR_M G09_TR_F TOT_G09_M TOT_G09_F;
cards;
5 2 9 6 2 3 6 9 7 4 1 4 8 4 . . 7 11 2 13 4 2 14 10 10 13 2 11 9 5 . .
7 1 8 10 2 4 8 0 1 2 8 3 4 5 . . 7 13 12 13 5 15 3 2 2 13 11 15 3 15 . .
7 8 10 9 0 4 7 9 8 0 3 10 7 1 . . 15 9 11 9 11 9 6 7 14 9 12 8 6 14 . .
4 8 9 0 10 6 4 3 10 9 2 5 8 2 . . 13 2 5 13 3 14 5 15 10 15 7 11 9 6 . .
7 6 5 1 4 5 7 4 5 1 8 3 4 4 . . 9 7 7 2 4 8 3 4 3 10 9 8 7 7 . .
3 1 0 2 4 10 2 10 5 9 7 1 8 8 . . 7 9 5 7 13 6 12 13 10 6 2 13 3 12 . .
5 7 4 1 7 9 6 8 3 1 3 2 10 4 . . 14 12 8 5 6 2 2 5 6 4 12 6 4 5 . .
8 9 3 2 3 10 6 5 9 10 8 1 4 5 . . 10 2 3 8 3 15 3 14 9 14 3 12 4 12 . .
4 3 2 6 4 1 2 5 5 6 4 5 4 1 . . 3 14 12 12 15 10 14 11 5 8 9 14 7 15 . .
1 10 4 2 1 3 9 8 3 3 3 0 3 1 . . 12 9 5 7 14 9 13 9 6 14 5 7 13 13 . .
;;;;
run;
You can do something like this to create some array that can be used in SAS statistics functions.
proc transpose data=g(obs=0 drop=tot_:) out=gnames;
var _all_;
run;
data gnames;
set gnames;
g = input(substrn(_name_,2,3),2.);
length race $2; race=scan(_name_,2,'_');
length sex $1; sex =scan(_name_,3,'_');
run;
proc sort;
by g sex;
run;
proc print;
run;
data _null_;
set gnames;
by g sex;
if first.sex then put +3 'Array G' g z2. sex $1. '[*] ' #;
put _name_ #;
if last.sex then put ';';
run;
405 data _null_;
406 set gnames;
407 by g sex;
408 if first.sex then put +3 'Array G' g z2. sex $1. '[*] ' #;
409 put _name_ #;
410 if last.sex then put ';';
411 run;
Array G08F[*] G08_HI_F G08_AM_F G08_AS_F G08_HP_F G08_BL_F G08_WH_F G08_TR_F ;
Array G08M[*] G08_HI_M G08_AM_M G08_AS_M G08_HP_M G08_BL_M G08_WH_M G08_TR_M ;
Array G09F[*] G09_HI_F G09_AM_F G09_AS_F G09_HP_F G09_BL_F G09_WH_F G09_TR_F ;
Array G09M[*] G09_HI_M G09_AM_M G09_AS_M G09_HP_M G09_BL_M G09_WH_M G09_TR_M ;
It seems the totals are interspersed between the variables that are to be summed, so we can sum "every variable since the last total that meet some criteria, such as ending in '_F'"?
It can for example be done as below. I used a simplified data set, but the totals are the sum of every variabel since the last total for each gender. I used proc contents to get the variable list. I then go down that list building a sum expression for men and one for females. When a variable named tot is encountered, a finished line of the form tot1_M=sum(var1_M,var2_M,var3_M); is outputed. These lines are collected in the macro variable totals and inserted in a data step.
If you know it's always 7 variables for men, 7 females and then a sum, so you can use simply position and not the name, there is an easier solution below.
data old;
var1_M=1;
var1_F=2;
var2_M=3;
var2_F=4;
var3_M=5;
var3_F=6;
tot1_M=.;
tot1_F=.;
var4_M=7;
var4_F=8;
var5_M=9;
var5_F=10;
var6_M=11;
var6_F=12;
tot2_M=.;
tot2_F=.;
run;
proc contents data=old out=contents noprint;
run;
proc sort data=contents;
by varnum;
run;
data temp;
set contents;
length sumline_F sumline_M $400;
if _n_=1 then do;
sumline_M="sum(";
sumline_F="sum(";
end;
retain sumline_M sumline_F;
if find(name, "_M")>0 and find(name,"tot")=0 then sumline_M=cat(strip(sumline_M),strip(name), ", ");
else if find(name, "_F")>0 and find(name,"tot")=0 then sumline_F=cat(strip(sumline_F), strip(name), ", ");
if find(name,"tot")>0 and find(name,"_M")>0 then do;
sumline_M=substr(sumline_M,1, length(sumline_M)-1);
finline=cat(strip(name), "=", strip(sumline_M),");");
sumline_M="sum(";
end;
if find(name,"tot")>0 and find(name,"_F")>0 then do;
sumline_F=substr(sumline_F,1, length(sumline_F)-1);
finline=cat(strip(name), "=", strip(sumline_F),");");
sumline_F="sum(";
end;
run;
proc sql;
select finline
into :totals separated by " "
from temp
where not missing(finline);
data new;
set old;
&totals;
run;
If the order is always the same (always Male-female), you could go like this:
/* Defining data. Note that _M _F are always alternating, with no variables missing*/
data old;
var1_M=1;
var1_F=2;
var2_M=3;
var2_F=4;
var3_M=5;
var3_F=6;
var4_M=5;
var4_F=6;
var5_M=5;
var5_F=6;
var6_M=5;
var6_F=6;
var7_M=5;
var7_F=6;
tot1_M=.;
tot1_F=.;
var8_M=7;
var8_F=8;
var9_M=9;
var9_F=10;
var10_M=11;
var10_F=12;
var11_M=11;
var11_F=12;
var12_M=11;
var12_F=12;
var13_M=11;
var13_F=12;
var14_M=11;
var14_F=12;
tot2_M=.;
tot2_F=.;
run;
/* We have 7 _M and 7 _F-variables, so the first sum variable is number 15, the next 16. Adding 16 gived us the numbers of the next sum-variables*/
data totals;
do i=15 to 200 by 16;
output;
end;
do i=16 to 200 by 16;
output;
end;
run;
/* Puts the index of the sum variables into a macro variable*/
proc sql;
select i
into :sumvars separated by " "
from totals;
/* Loop variables using an array. If it is a sum variable, it's the sum of the 7 last variables, skipping every other.*/
data new;
set old;
array vars{*} _all_;
do i=1 to dim(vars);
if i in (&sumvars) then do;
vars{i}=sum(vars{i-2}, vars{i-4}, vars{i-6}, vars{i-8}, vars{i-10}, vars{i-12}, vars{i-14});
end;
end;
drop i;
run;

(q/kdb+) Generating an automated list

Example 1)
I have the code below
5#10+1*2
that generates
index value
0 12
1 12
2 12
3 12
4 12
How can I replace the number "1" by the index?
then generating
5#10+index*2
index value
0 10
1 12
2 14
3 16
4 18
update Example 2)
Now, if I have, let's say
mult:5;
t:select from ([]numC:1 3 6 4 1;[]s:50 16 53 6 33);
update lst:(numC#'s) from t
the last update will generate
numC s lst
1 50 50
3 16 16 16 16
6 53 53 53 53 53 53 53
4 6 6 6 6 6
1 33 33
How can I generate the "lst" column as per below?
numC s lst
1 50 50+0*mult
3 16 16+0*mult 16+1*mult 16+2*mult
6 53 53+0*mult 53+1*mult 53+2*mult 53+3*mult 53+4*mult 53+5*mult
4 6 6+0*mult 6+1*mult 6+2*mult 6+3*mult
1 33 33+0*mult
I tried something like
update lst:(numC#'s + (til numC)*mult) from t
but I am getting an error
ERROR: 'type
Thanks vm
Is this what you're looking for:
q)x:5
q)x#10+(til x)*2
10 12 14 16 18
http://code.kx.com/q/ref/arith-integer/#til
You can remove take # and use til to simplify to:
q)10+2*til 5
10 12 14 16 18
Using til will create a list of a list of 5 elements (0->4), so you will not need take 5 elements from the resulting list. Take will only be required if your list of indices is greater than 5.
Update:
For your second example the following should work:
q)update lst:{y+x*til z}'[mult;s;numC] from t
q)update lst:s+mult*til each numC from t
numC s lst
-------------------------
1 50 ,50
3 16 16 21 26
6 53 53 58 63 68 73 78
4 6 6 11 16 21
1 33 ,33
There are many ways with which we can get achieve this:
1) 10+2*til 5
2) (2*til 5) + 10
/ take operator: The dyadic take function creates lists. The left argument specifies the count and shape and the right argument provides the data.
It is useful for selecting from the front or end of a list.
https://code.kx.com/wiki/Reference/NumberSign
q)5#0 1 2 3 4 5 6 7 8 / take the first 5 items
0 1 2 3 4
q)-5#0 1 2 3 4 5 6 7 8 / take the last 5 elements
4 5 6 7 8
use take operator # only when it is required.
say we have 10 elements, of which we need five on output, then we can use:
5#10+2*til 10
/ The til function takes a non-negative integer argument X and returns the first X integers

Matlab : how to read a constant width text file and turn it into a matrix?

i have a ASCII text file each row has format
------------------------------
Variable Columns Type
------------------------------
ID 1-11 Character
YEAR 12-15 Integer
MONTH 16-17 Integer
ELEMENT 18-21 Character
VALUE1 22-26 Integer
MFLAG1 27-27 Character
QFLAG1 28-28 Character
SFLAG1 29-29 Character
VALUE2 30-34 Integer
MFLAG2 35-35 Character
QFLAG2 36-36 Character
SFLAG2 37-37 Character
. . .
. . .
. . .
VALUE31 262-266 Integer
MFLAG31 267-267 Character
QFLAG31 268-268 Character
SFLAG31 269-269 Character
------------------------------
i only need variables "year" "month" "element" and "valuei" i = 1,2,...,31 (there are 31 values in each row)
parameters (like MFLAGi) can have a character in their place or white-space .
also value might not fill all of it's space with numbers so there can be space.
two sample lines from my text file
USC00190736189301TMAX 33 6 117 6 0 I6 -89 6 -28 6 -83 6 -67 6 -67 6 -28 6 -6 6 -139 6 -111 6 -117 6 -89 6 -106 6 -111 6 -106 6 -106 6 -39 6 -78 6 -61 6 -33 6 -6 6 6 6 39 6 28 6 6 6 -61 6 61 6 56 6 0 6
USC00190736189301TMIN -56 6 11 I6 -106 6 -161 6 -106 6 -133 6 -144 6 -117 6 -161 6 -156 6 -206 6 -183 6 -161 6 -161 6 -139 6 -178 6 -189 6 -161 6 -133 6 -150 6 -156 6 -156 6 -100 6 -50 6 -39 6 -67 6 -78 6 -111 6 -94 6 -33 6 -50 6
for example in line 1 value1 has only used 2 out of it's 5 spaces (' 33')
and both MFLAG1 and QFLAG1 are white-space .
i want to put "year" "month" "element" and "valuei" in a matrix and depending on the "element" value choose some of the rows and make my final matrix how can i do that ?
what i have thought of :
%open file
fid = fopen('myt.txt')
% read from file
%'whitespace','' do not overlook white spaces in counting
C = textscan(fid , formatspec ,'whitespace','')
i have two problems with this:
the formatspec i think should be
'%*11c %4d %2d %4c %5d %*3c'
ignore year month element valuei ignore
------------------
repeat this part 31 times
how can i repeat that part 31 times and concat all the parts together ?
i end up having a cell array C since "element" is a string i can't change it into a matrix. apparently C is column by column and each column is a whole string . then how can i access the read data row by row to select the rows i need (according to the value of "element") ?
am I using the wrong method to do what i want ? what should i do ?
for (1), you can use repmat:
idspec = ['%*11c %4d %2d %4c '];
valuespec = repmat('%5d %*3c',[1 31]);
filespec = [idspec valuespec];
(or something similar)
for (2), I can see a few options:
a) You could read the file twice, once ignoring the character column, and using the 'collectoutput' option, so that C would basically contain a matrix. You can read again by ignoring everything but ELEMENT, so that C would have the remaining info.
b) Using 'collectoutput', you'd have C with the year a month, then the ELEMENT, and then the rest.

Matlab Special Matrix

Is there a MATLAB function to generate this matrix?:
[1 2 3 4 5 6 7 ... n;
2 3 4 5 6 7 8 ... n+1;
3 4 5 6 7 8 9 ... n+2;
...;
n n+1 n+2 ... 2*n-1];
Is there a name for it?
Thanks.
Yes indeed there's a name for that matrix. It's known as the Hankel matrix.
Use the hankel function in MATLAB:
out = hankel(1:n,n:2*n-1);
Example with n=10:
out =
1 2 3 4 5 6 7 8 9 10
2 3 4 5 6 7 8 9 10 11
3 4 5 6 7 8 9 10 11 12
4 5 6 7 8 9 10 11 12 13
5 6 7 8 9 10 11 12 13 14
6 7 8 9 10 11 12 13 14 15
7 8 9 10 11 12 13 14 15 16
8 9 10 11 12 13 14 15 16 17
9 10 11 12 13 14 15 16 17 18
10 11 12 13 14 15 16 17 18 19
Alternatively, you may be inclined to want a bsxfun based approach. That is certainly possible:
out = bsxfun(#plus, (1:n), (0:n-1).');
The reason why I wanted to show you this approach is because in your answer, you used repmat to generate the two matrices to add together to create the right result. You can replace the two repmat calls with bsxfun as it does the replication under the hood.
The above solution is for older MATLAB versions that did not have implicit broadcasting. For recent versions of MATLAB, you can simply do the above by:
out = (1:n) + (0:n-1).';
My standard approach is
repmat(1:n,n,1)+repmat((1:n)',1,n)-1