How can I query the same column in a kdb table multiple times in a single statement? - kdb

I have the following table in kdb...
p:([]r:("(A|A(A|B|C|D).*)";"A(E|F|G|H|I).*";"A(J|K|L|M).*";"A(N|O|P|Q|R|S).*";"A(T|U|V|W|X|Y|Z).*";"B.*";"(C|C(A|B|C|D|E).*)";"C(F|G|H|I|J|K).*";"C(L|M|N|O|P|Q|R).*";"C(S|T|U|V|W|X|Y|Z).*";"D.*"))
r
----------------------
"(A|A(A|B|C|D).*)"
"A(E|F|G|H|I).*"
"A(J|K|L|M).*"
"A(N|O|P|Q|R|S).*"
"A(T|U|V|W|X|Y|Z).*"
"B.*"
"(C|C(A|B|C|D|E).*)"
"C(F|G|H|I|J|K).*"
"C(L|M|N|O|P|Q|R).*"
"C(S|T|U|V|W|X|Y|Z).*"
"D.*"
and the below function that parses each row of the table...
getRange:{$[x like "*(*";
[if[x like "(*"; x2:1#1_x; l:enlist x2; x:-1_(3_x)];
l,:enlist {(3#x),"-",(-3#x)} ssr[ssr[ssr[x;".";""];")";"]"];"(";"["];
if[((count l)>1)&(l[1] like "*A-*"); l[1]:ssr[l[1]; "A-";"0-9/A-"]];
:l];
:enlist ssr[x;".";""]
];
}
Which gives an output like this...
r1:raze getRange'[exec r from p]
q)r1
,"A"
"A[0-9/A-D]*"
"A[E-I]*"
"A[J-M]*"
"A[N-S]*"
"A[T-Z]*"
"B*"
,"C"
"C[0-9/A-E]*"
"C[F-K]*"
"C[L-R]*"
"C[S-Z]*"
"D*"
I'm parsing the rows so they can be inserted into a query similar to something like select from t where sym like raze getRange'[exec r from p][0]
What I'd like to be able to do is combine the first "single A" with the first "group of A" and the same with the C's (so it looks like below). But the problem I'm having is that those results can't be easily inserted into a query...
(,"A";"A[0-9/A-D]*")
,"A[E-I]*"
,"A[J-M]*"
,"A[N-S]*"
,"A[T-Z]*"
,"B*"
(,"C";"C[0-9/A-E]*")
,"C[F-K]*"
,"C[L-R]*"
,"C[S-Z]*"
,"D*"
Is there a way in q that I can do this? Essentially, select from t where sym like (enlist "A";"A[0-9/A-D]*")
Please let me know if you need any additional info. Thank you in advance.

For matching against multiple regexps we can do following
select from t where any sym like/:("A";"A[0-9/A-D]*")

Related

oracle: grouping on merged columns

I have a 2 tables FIRST
id,rl_no,adm_date,fees
1,123456,14-11-10,100
2,987654,10-11-12,30
3,4343,14-11-17,20
and SECOND
id,rollno,fare,type
1,123456,20,bs
5,634452,1000,bs
3,123456,900,bs
4,123456,700,bs
My requirement is twofold,
1, i first need to get all columns from both tables with common rl_no. So i used:
SELECT a.ID,a.rl_no,a.adm_date,a.fees,b.rollno,b.fare,b.type FROM FIRST a
INNER JOIN
SECOND b ON a.rl_no = b.rollno
The output is like this:
id,rl_no,adm_date,fees,rollno,fare,type
1,123456,14-11-10,100,123456,20,bs
1,123456,10-11-12,100,123456,900,bs
1,123456,14-11-17,100,123456,700,bs
2,Next i wanted to get the sum(fare) of those rollno that were common between the 2 tables and also whose fare >= fees from FIRST table group by rollno and id.
My query is:
SELECT x.ID,x.rl_no,,x.adm_date,x.fees,x.rollno,x.type,sum(x.fare) as "fare" from (SELECT a.ID,a.rl_no,a.adm_date,a.fees,b.rollno,b.fare,b.type FROM FIRST a
INNER JOIN
SECOND b ON a.rl_no = b.rollno) x, FIRST y
WHERE x.rollno = y.rl_no AND x.fare >= y.fees AND x.type IS NOT NULL GROUP BY x.rollno,x.ID ;
But this is throwing in exceptions.
ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
The expected output will be like this:
id,rollno,adm_date,fare,type
1,123456,14-11-10,1620,bs
So could someone care to show an oracle newbie what i'm doing wrong here?
It looks like there's a couple different problems here;
Firstly, you're trying to group by an x.ID column which doesn't exist; it looks like you'll want to add ID to the selected columns in your sub-query.
Secondly, when aggregating with GROUP BY, all selected columns need to be either listed in the GROUP BY statement or aggregated. If you're grouping by rollno and ID, what do you want to have happen to all the extra values for adm_date, fees, and type? Are those always going to be the same for each distinct rollno and ID pair?
If so, simply add them to the GROUP BY statement, ie,
GROUP BY adm_date, fees, type, rollno, ID
If not, you'll need to work out exactly how you want to select which one to be output; If you've got output like your example (adding in an ID column here)
ID,adm_date,fees,rollno,fare,type
1,14-11-10,100,123456,20,bs
1,10-11-12,100,123456,900,bs
1,14-11-17,100,123456,700,bs
Call that result set 'a'. If I run;
SELECT a.ID, a.rollno, SUM(a.fare) as total_fare
FROM a
GROUP BY a.ID, a.rollno
Then the result will be a single row;
ID,rollno,total_fare
1,123456,1620
So, if you also select the adm_date, fees, and type columns, oracle has no idea what you mean to do with them. You're not using them for grouping, and you're not telling oracle how you want to pick which one to use.
You could do something like
SELECT a.ID,
FIRST(a.adm_date) as first_adm_date,
FIRST(a.fees) as first_fees,
a.rollno,
SUM(a.fare) as total_fare,
FIRST(a.type) as first_type
FROM a
GROUP BY a.ID, a.rollno
Which would give the result;
ID,first_adm_date,first_fees,rollno,total_fare,first_type
1,14-11-10,100,123456,1620,bs
I'm not sure if that's what you mean to do though.

string query in a function in kdb

func:{[query] value query};
query is part of my function. I have add some like delete xxx, yyyy from (value query) and some manipulation. I am not sure why when I don't use value "query", the function doesn't work. It said it cannot find the table. So I have to use value query in the function and query is a parameter. I need to pass "select from tab" to the function.
My questions is: how do I send if the filter is a string too?
func["select from tab where a="abc""] <<< this does not work
How can I make string inside a string work?
Also, not sure why if I do
func["select from tab where date = max date"] it did not work due to length error
but func["100#select from tab where date = max date"] it works ?
The whole function is
getTable:{[query]loadHDB[];.Q.view date where date < .z.D-30;tab:(delete xxxx,yyyyy,sub,ID,subID,tID,subTID,text,gID from((value query)));remove[];update {";"sv #[s;where (s:";"vs x) like "cId=*";:;enlist""]}each eData from (update {";"sv #[s;where (s:";"vs x) like "AId=*";:;enlist""]}each eData from tab)};
remove:{[]delete tab from `.};
loadHDB:{[]value "\\l /hdb};
You can escape the quotes using backslash http://code.kx.com/wiki/Reference/BackSlash#escape
func["select from tab where a like \"abc\""]
Edit:
If tab is a HDB table then this length error could point to a column length issue (which 100# is avoiding). What does the following return?
q)checkPartition:{[dt] a!{c!{count get x} each ` sv' x,/:c:({x where not x like "*#"} key[x])except `.d}each a:(` sv' d,/:key[d:hsym `$string dt])};
q)check:checkPartition last date
q)(where{1<count distinct value x}each check)#check
I like using -3! and also -1 to print the result. If you know what your query should look like if executed from the console then after you construct your string, use -1 to print the string. It should print the query as how it would be executed by the console.
q)stst:-3!
q)"select max age by user from tab where col1 like ",stst"Hello"
"select max age by user from tab where col1 like \"Hello\""
q)/then to view how it will be executed, use -1
q)-1"select max age by user from tab where col1 like ",stst"Hello";
select max age by user from tab where col1 like "Hello"
q)/looks good

KDB string concatenation with symbol list for dynamic query

In this link, there is an example on how to include a dynamic parameter. d, in a KDB select query:
h: hopen`:myhost01:8012 // open connection
d: 2016.02.15 // define date var
symList: `GBPUSD`EURUSD
h raze "select from MarketDepth where date=", string d, ", sym in `GBPUSD`EURUSD" // run query with parameter d
Here d is of type date and is easy to string concatenate in order to generate a dynamic query.
If I want to add symList as a dynamic parameter as well by converting to string:
raze "select from MarketDepth where date=", string d, ", sym in ", string symList
The concatenated string becomes: select from MarketDepth where date=2016.02.15, sym in GBPUSDEURUSD, in other words the string concatenation loses the backticks so the query does not run. How can I solve this?
p.S: I know about functional querying but after failing for 2 hours, I have given up on that.
No need for functional selects.
q)MarketDepth:([] date:9#2016.02.15; sym:9#`A`B)
q)d:2016.02.15
q)symList:`B
q)h ({[dt;sl] select from MarketDepth where date=dt,sym in sl}; d; symList)
date sym
--------------
2016.02.15 B
2016.02.15 B
2016.02.15 B
2016.02.15 B
You are right, string SYMBOL does not preserve a backtick character, so you'll have to append it yourself like this:
symList: `GBPUSD`EURUSD
strSymList: "`",'string symList / ("`GBPUSD";"`EURUSD")
I used join , with each-both adverb ' to join a backtick with each element of a list. Having your symbol list stringified your dynamic query becomes
"select from MarketDepth where date=", (string d), ", sym in ",raze"`",'string symList
You can also use parse to see how a shape of a functional form of your query will look like.
q) parse "select from MarketDepth where date=", (string d), ", sym in ",raze"`",'string symList
(?;`MarketDepth;enlist ((=;`date;2016.02.15);(in;`sym;enlist `GBPUSD`EURUSD));0b;())
Now it's easy to create a functional select:
?[`MarketDepth;enlist ((=;`date;2016.02.15);(in;`sym;enlist symList));0b;()]
Hope this helps.
Update: #Ryan Hamilton's solution is probably the best in your particular scenario. You can even make a table name an argument if you want:
h({[t;d;s]select from t where date=d,sym in s};`MarketDepth; d; symList)
But it is worth noting that you can't use this technique when you need to make a list of columns dynamic. The following will NOT work:
h({[c;d;s]select c from t where date=d,sym in s};`time`sym; d; symList)
You will have to either build a dynamic select expression like you do or use functional forms.
Others have already given good alternative approaches for your problem. But in case if you need to join string and symbols (or other data types) without losing backtick, function .Q.s1 does the task.
q) .Q.s1 `a`b
q)"`a`b"
q)"select from table where sym in ",.Q.s1 symlist
Note: Generally it is not suggested to use .Q namespace functions.

Get substring into a new column

I have a table that contains a column that has data in the following format - lets call the column "title" and the table "s"
title
ab.123
ab.321
cde.456
cde.654
fghi.789
fghi.987
I am trying to get a unique list of the characters that come before the "." so that i end up with this:
ab
cde
fghi
I have tried selecting the initial column into a table then trying to do an update to create a new column that is the position of the dot using "ss".
something like this:
t: select title from s
update thedot: (title ss `.)[0] from t
i was then going to try and do a 3rd column that would be "N" number of characters from "title" where N is the value stored in "thedot" column.
All i get when i try the update is a "type" error.
Any ideas? I am very new to kdb so no doubt doing something simple in a very silly way.
the reason why you get the type error is because ss only works on string type, not symbol. Plus ss is not vector based function so you need to combine it with each '.
q)update thedot:string[title] ss' "." from t
title thedot
---------------
ab.123 2
ab.321 2
cde.456 3
cde.654 3
fghi.789 4
There are a few ways to solve your problem:
q)select distinct(`$"." vs' string title)[;0] from t
x
----
ab
cde
fghi
q)select distinct(` vs' title)[;0] from t
x
----
ab
cde
fghi
You can read here for more info: http://code.kx.com/q/ref/casting/#vs
An alternative is to make use of the 0: operator, to parse around the "." delimiter. This operator is especially useful if you have a fixed number of 'columns' like in a csv file. In this case where there is a fixed number of columns and we only want the first, a list of distinct characters before the "." can be returned with:
exec distinct raze("S ";".")0:string title from t
`ab`cde`fghi
OR:
distinct raze("S ";".")0:string t`title
`ab`cde`fghi
Where "S " defines the types of each column and "." is the record delimiter. For records with differing number of columns it would be better to use the vs operator.
A variation of WooiKent's answer using each-right (/:) :
q)exec distinct (` vs/:x)[;0] from t
`ab`cde`fghi

KDB: select rows based on value in one column being contained in the list of another column

very simple, silly question. Consider the following table:
tt:([]Id:`6`7`12 ;sym:`A`B`C;symlist:((`A`B`M);(`X`Y`Z);(`H`F`C)))
Id sym symlist
---------------
6 A `A`B`M
7 B `X`Y`Z
12 C `H`F`C
I would like to select all rows in tt where the element in sym is contained in the list symlist. In this case, it means just the first and third rows. However, the following query gives me a type error.
select from tt where sym in symlist
(`type)
Whats the proper way to do this? Thanks
You want to use the ' (each-both) adverb, so that they "pair up" so to speak. Recall that sym is just list, and symlist is a list of lists. You want to check each element in sym with the respective sub-list in symlist. You do this by telling it to "pair up".
q)tt:([]id:6712; sym:`A`B`C; symlist:(`A`B`M;`X`Y`Z;`H`F`C))
q)select from tt where sym in'symlist
id sym symlist
----------------
6712 A A B M
6712 C H F C
It's not entirely clear to me why your query results in a type error, so I'd be interested in hearing other people's responses.
q)select from tt where sym in symlist
'type
in
`A`B`C
(`A`B`M;`X`Y`Z;`H`F`C)
q)select from tt where {x in y}[sym;symlist]
id sym symlist
--------------
In reponse to JPCs answer (couldn't format this as a comment)....
Type error possibly caused by applying "where" to a scalar boolean
q)(`a`b`c) in (`a`g`b;`u`i`o;`g`c`t)
0b
q)where (`a`b`c) in (`a`g`b;`u`i`o;`g`c`t)
'type
Also, the reason the {x in y} lambda doesn't cause the error is because the "in" is obscured and is not visible to the parser (parser doesn't look inside lambdas)
q)0N!parse"select from tt where {x in y}[sym;symlist]";
(?;`tt;,,({x in y};`sym;`symlist);0b;())
Whereas the parser can "see" the "in" in the first case
q)0N!parse"select from tt where sym in symlist";
(?;`tt;,,(in;`sym;`symlist);0b;())
I'm guessing the parser tries to do some optimisations when it sees the "in"