SAS collapse data by identifiers

SAS collapse data by identifiers - merge

I have a bunch of data that I would like to collapse by a few identifier variables and keep only the non-missing values of other variables. For each unique combination of id, title, info there is 1 value of var1/var2/var3 that isn't missing that I would like to keep. Note that var3 is numeric while var1/var2 are character.
I have data like:
id title info var1 var2 var3
1 foo Some string here string 1
1 foo Some string here string 2
1 foo Some string here number 3
2 bar A different string string 4 string 5
2 bar A different string number 6
3 baz Something else string 7 number 8
And I want it to be like:
id title info var1 var2 var3
1 foo Some string here string 1 string 2 number 3
2 bar A different string string 4 string 5 number 6
3 baz Something else string 7 number 8
Thanks!

The UPDATE statement can handle that. The last non-missing value will be used. The UPDATE statement takes exactly two datasets, the original and the transaction. The original dataset must have only one observation per by group. But you can use your single dataset by using the OBS=0 dataset option to create an empty master dataset.
First here is your sample data.
data have ;
infile cards dsd truncover ;
length id 8 title info var1-var3 $20 ;
input id -- var3 ;
cards;
1,foo,Some string here,string 1,,
1,foo,Some string here,,string 2,
1,foo,Some string here,,,number 3
2,bar,A different string,string 4,string 5,
2,bar,A different string,,,number 6
3,baz,Something else,string 7,,number 8
;;;;
Here is the step to collapse.
data want ;
update have(obs=0) have ;
by id title info;
run;

Related

KDB How to update column values

I have a table which has column of symbol type like below.
Name
Value
First
TP_RTD_FRV
Second
RF_QWE_FRV
Third
KF_FRV_POL
I need to update it as below, wherever I have FRV, I need to replace it with AB_FRV. How to achieve this?
Name
Value
First
TP_RTD_AB_FRV
Second
RF_QWE_AB_FRV
Third
KF_AB_FRV_POL

q)t
name v
---------------
0 TP_RTD_FRV
1 RF_QWE_FRV
2 KF_FRV_POL
3 THIS
4 THAT
q)update `$ssr[;"FRV";"AB_FRV"]each string v from t
name v
------------------
0 TP_RTD_AB_FRV
1 RF_QWE_AB_FRV
2 KF_AB_FRV_POL
3 THIS
4 THAT
or without using qSQL
q)#[t;`v;]{`$ssr[;"FRV";"AB_FRV"]each string x}
name v
------------------
0 TP_RTD_AB_FRV
1 RF_QWE_AB_FRV
2 KF_AB_FRV_POL
3 THIS
4 THAT
Depending on the uniqueness of the data, you might benefit from .Q.fu
q)t:1000000#t
q)\t #[t;`v;]{`$ssr[;"FRV";"AB_FRV"]each string x}
2343
q)\t #[t;`v;].Q.fu {`$ssr[;"FRV";"AB_FRV"]each string x}
10

How to convert a symbol to a string in kdb+?

For example, if I have a list of symbols i.e (`A.ABC;`B.DEF;`C.GHI) or (`A;`B;`C), how could I convert each item in the list to a string?

string will convert them. It's an atomic function
q)string (`A.ABC;`B.DEF;`C.GHI)
"A.ABC"
"B.DEF"
"C.GHI"

You can use the keyword string to do this documented here
q)lst:(`A;`B;`C)
// convert to list of strings
q)string lst
,"A"
,"B"
,"C"

As the others have mentioned, string is what you're after. In your example if you're interested in separating the prefix and suffix separated by the . you can do
q)a:(`A.ABC;`B.DEF;`C.GHI)
q)` vs' a
A ABC
B DEF
C GHI
and if you want to convert these to strings you can just use string again on the above.

q)string each (`A.ABC;`B.DEF;`C.GHI)
"A.ABC"
"B.DEF"
"C.GHI"

Thanks all, useful answers! While I was trying to solve this on my own in parallel, I came across ($) that appears to work as well.
q)example:(`A;`B;`C)
q)updatedExample:($)example;
q)updatedExample
enlist "A"
enlist "B"
enlist "C"

use String() function.
q)d
employeeID firstName lastName
-----------------------------------------------------
1001 Employee 1 First Name Employee 1 Last Name
1002 Employee 2 First Name Employee 2 Last Name
q)update firstName:string(firstName) from `d
`d
q)d
employeeID firstName lastName
-------------------------------------------------------
1001 "Employee 1 First Name" Employee 1 Last Name
1002 "Employee 2 First Name" Employee 2 Last Name

readtable on text file ignores first row which contains the column names

I have a tab delimited text file which contains some data organised into columns with the first row acting as column names such as:
TN Stim Task RT
1 A A 500.2
2 B A 569
3 C A 654
and so on.
I am trying to read this textfile into MATLAB(r2018a) using readtable with
Data1 = readtable(filename);
I manage to get all the data in Data1 table, but the column names are showing as Var1, Var2 etc. If I use Name Value pairs to specify to read first row as column names as in:
Data1 = readtable(filename, 'ReadVariableNames', true);
then I get the column names as the first data row, i.e.
1 A A 500.2
So it just looks like it is ignoring the first row completely. How can I modify the readtable call to use the entries on the first row as column names?

I figured it out. It appears there was an additional tab in some of the rows after the last column. Because of this, readtable was reading it as an additional column, but did not have a column name to assign to it. It seems that if any of the column names are missing, it names them all as Var1, Var2, etc.

Based on the way your sample file text is formatted above, it appears that the column labels are separated by spaces instead of by tabs the way the data is. In this case, readtable will assume (based on the data) that the delimiter is a tab and treat the column labels as a header line to skip. Add tabs between them and you should be good to go.
Test with spaces between column labels:
% File contents:
TN Stim Task RT
1 A A 500.2
2 B A 569
3 C A 654
>> Data1 = readtable('sample_table.txt')
Data1 =
Var1 Var2 Var3 Var4 % Default names
____ ____ ____ _____
1 'A' 'A' 500.2
2 'B' 'A' 569
3 'C' 'A' 654
Test with tabs between column labels:
% File contents:
TN Stim Task RT
1 A A 500.2
2 B A 569
3 C A 654
>> Data1 = readtable('sample_table.txt')
Data1 =
TN Stim Task RT
__ ____ ____ _____
1 'A' 'A' 500.2
2 'B' 'A' 569
3 'C' 'A' 654

Scenario based questions in Datastage

I have two scenario based questions here.
Question 1
Input Dataset
Col1
A
A
B
C
C
B
D
A
C
Output Dataset
Col1 Col2
A 1
A 2
A 3
B 1
B 2
C 1
C 2
C 3
D 1
Question2
Input data string
AA-BB-CC-DD-EE-FF (can be of any delimiter and string can have any length)
Output data string
string 1 -> AA
string 2 -> BB
string 3 -> CC
string 4 -> DD
Thanks & Regards,
Subhasree

Question 1: Can be solved with a transformer. Sort the data and use the lastrowingroup functionality.
For Col2 just create a counter as a stage variable and add 1 for each row - if reset it with a second stage variable if lastrowingroup is reached.
Aternatively you could use a rownumber column in SQL.
Question2: You have not provided enough information. Is string1 a column or row? If you do not know anything upfront about the structure (any delimiter) this will get hard...

crystal report formula, loop on some data to write them in a single string value

I have a crystal report, that returns data from a query like follow:
Quantity type
1 cat
2 dogs
5 birds...
I want to make a formula to show them in a single string value like:
1 cat; 2 dogs; 5 birds;
So How to write this formula

stringvar A;
A := A & totext({Quantity}, "#") & " " & {type} &";";
A

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

SAS collapse data by identifiers - merge

Related

KDB How to update column values

How to convert a symbol to a string in kdb+?

readtable on text file ignores first row which contains the column names

Scenario based questions in Datastage

crystal report formula, loop on some data to write them in a single string value

Categories

Resources