Extra and misaligned rows when importing data in SAS - import

I am trying to import data in SAS like:
A B C D E
x y z h i
s1 s2 s3 s4 s5
where A B C D and E are column names.
I have 240 columns in my dataset and the code I am using is:
data INFO;
infile Attdata notab dlm='09'x dsd missover LRECL = 100000000;
length A B C D E $200; (I am importing 240 columns)
input A B C D E;
run;
Whenever I import data, some of the rows of column B, C etc, gets stacked below rows of A:
A B C D E
x h i
s1 s2 s3 s5
y s4
z
Is there a way to fix this? Do I need to do something with lrecl one? My data gets all weird after running this code. Might there be problem with length one?

It may be something to do with missover (rather than LRECL). I have found this site to be useful in the past: http://www2.sas.com/proceedings/sugi26/p009-26.pdf
Are you saying in your question that data from line 1 is appearing on lines 3 and 4 and some data from line 2 is on line 3? I've never seen SAS do this before.
You may want to check your delimiter/end of line characters.

Related

KDB+ Merging multiple update statements

q)d:([] f1:`a`b` ;f2:```c; m1:`x``z;m2:``y`z)
f1 f2 m1 m2
-----------
a x
b y
c z z
I want to update the f1 & m1 columns to f2 & m2 respectively if f1 & m1 have nulls; actually I want to merge these 2 queries to one update statement :
update f1:f2 from d where null f1
update m1:m2 from d where null m1`
An alternative you might like to consider is fill, ^ which allows you to fill nulls in one list with items from another list (in this case, the lists are columns in the table) e.g.
q)d:([] f1:`a`b` ;f2:```c; m1:`x``z;m2:``y`z)
q)update f2^f1,m2^m1 from d
f1 f2 m1 m2
-----------
a x
b y y
c c z z
You can use Triadic vector conditional evaluation ?
?[vb;exprtrue;exprfalse]
The new query would be :
q)update f1:?[null f1;f2;f1] , m1:?[null m1;m2;m1] from d
f1 f2 m1 m2
-----------
a x
b y y
c c z z
Fill can be used to update nulls:
If you want to update table d in place, then you can use:
update f2^f1,m2^m1 from`d
or
![`d;();0b;`f1`m1!((^;`f2;`f1);(^;`m2;`m1))]
If you want to display the output of update without updating original table, then:
update f2^f1,m2^m1 from d or
![d;();0b;`f1`m1!((^;`f2;`f1);(^;`m2;`m1))]

How to run a loop in matlab even a struct condition satisfies and warning message?

Lets say I have a struct with three 1x1 struct elements with 5 fields in them respectively, the struct is GroupA and 1x1 struct elements are A,B,C and each has 5 fields in them say ID, E, F, G and h. I Need to check for each of them if h is same in any both and give a warning saying h is same in A & B. For example:
struct GroupA
A B C
Id ID ID
E E E
F F F
G G G
h h h
I wrote
for B_card=1:size(GroupA,2)-1
for C_card=(B_card+1):size(GroupA,2)
if strcmp(GroupA(B_card).h,GroupA(C_card).h)==1
warning('The h is same in',GroupA(B_card).ID,'&',GroupA(C_card).ID);
end
end
end
I got two problems: one the loop end (not sure whether is ending or not, can't understand) when an if condition satisfies and the warning message is only showing " The h is same in". I am quite new to Matlab so explained as good as I can, please let me know if you need some more explanation and thanks for your help.
for B_card=1:size(GroupA,2)-1
for C_card=(B_card+1):size(GroupA,2)
if strcmp(GroupA(B_card).h,GroupA(C_card).h)==1
warning('The h is same in',num2str(GroupA(B_card).ID),'&',num2str(GroupA(C_card).ID));
end
end
end
its working like this also & thanks every one

Column - Most frequent letter in a group of 4 rows

I have this column in excel;
V
V
F
V
C
F
F
F
...
Now I'm reading it with matlab using
[~,txt] = xlsread('2012_15min.xls','JAN','B25:B2999');
And now I want to get a new column that gives me the most repetitive letter in groups of 4 rows, so for the first 4 rows I will get V (in this example), and for the second F.
So I will get a new column with;
V
F
...
I hope you can help me.
You can use the command mode to find the most frequent occurence. The only cavity is that mode does not work with chars. So, you can reshape txt to be of size 4-by-whatever and then find the mode of each 4-column
>> res = char( mode( double( reshape( txt, 4, [] ) ) ) ).'
res =
V
F

Efficient way of mapping similar inputs to similar outputs

Is there a efficient way of approaching this particular problem in matlab.
I am trying to map this matrix or possible array BeansRice (see below)
Beans={0:1,0:1,0:2,0:2,0:2,0:2,0:1,0:1,0:2,0:2}
[a b c d e f g h i j ] = ndgrid(Beans{:})
BeansRice = [a(:) b(:) c(:) d(:) e(:) f(:) g(:) h(:) i(:) j(:)]
into a matrix/array BR (see below)
BR=[abc, de, fg, hij];
where if columns a, b and c each have values 0 (ties preference), I have preference for c>b>a. If all columns a, b and c each have values 1 (ties no preference), BR(1)=1. If columns a and b have values 0 and column c has value 2, BR(1)=2. If columns a and b have values 1 and column c has value 2, BR(1)=1.
I have an if function with indexing but I was thinking if it is possible to improve it, using the rank/order of the values in the matrix to break ties. Looking for a more efficient process as this is only a sub of a large problem.
You can use logical indexing instead of if conditions. For example
BR1(a==1 & b==1 & c==1)=1
BR1(a==0 & b==0 & c==2)=2
BR1(a==1 & b==1 & c==2)=1
...
then process the other parts, BR2(d==... & e>...)=##, then concatenate to obtain what you need
BR=[BR1(:) BR2(:) ...]
etc...

Algorithm for short IDs

I'm looking for an algorithm - or should I better say: encoding? - to compress integer numbers to short string IDs like URL shorteners use: http://goo.gl/0puu
Url safe base 64 comes close to it, but maybe there is something better.
Requirements:
as short as possible
url safe
"yi_H" called base64 "perfect" and after a bit more research I came to the same conclusion, since only the following characters could be used in URLs without worry:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~
Thats 66 characters, whereas base64 only uses 64 characters. The two more possible characters wouldn't be practical because 66 is not based on 2.
Conclusion: URL safe base64 (offered as part of Apache Commons for example) is perfect for short IDs.