How to convert "like each" into a functional form? - kdb

Let's say I have a column of a table whose data type is a character array. I want to pass in a functional select where clause, where the column is in a list of given strings. However, I cannot simply use (in; `col; myList) for reasons. Instead, I need to do the equivalent of:
max col like/: myList
which effectively gives the same result. However, I have tried to put this in functional form
(max; (like/:; `col; myList))
And I am getting a type error. Any ideas on how I could make this work?

A nice trick when dealing with this problem is using parse on a string of the select statement you want to functionalize. For example:
q)parse"select from t where max col like/: myList"
?
`t
,,(max;((/:;like);`col;`myList))
0b
()
Or specifically in your case you want the 3rd element of the result list (the where clause):
q)(parse"select from t where max col like/: myList")2
max ((/:;like);`col;`myList)
I even think using this pattern in your actual code can be a good idea, as functionalized statements like max ((/:;like);`col;`myList) can get pretty unreadable pretty quickly!
Hope that helps!

(any; ((/:;like); `col; enlist,myList))

it should be: (max;((/:;like);`col;`mylist))

Related

I need to change the output of a query so that instead of it coming back as the abbreviation 'em' it says 'employee'. Tsql

I have the correct result coming back. I just need to convert 6 abbreviations in that result to their correct names. There are 20k names assigned to 1 of 6 abbreviated names.
I tried aliasing but that seems to only work for table names.
I tried doing a case statement but that didn't work.
You need to provide more details (like some sample input and output), but if you have data like EM100, and you want to make it EMPLOYEE 100, then you could use an expression such as:
CASE WHEN ColumnName like 'EM%' THEN 'EMPLOYEE ' + SUBSTRING (ColumnName,3,100)
WHEN ColumnName like 'RN%' THEN 'REGNURSE' + SUBSTRING (ColumnName,3,100)
else ColumnName END
But providing more details will help provide a more specific answer.

Access locally scoped variables from within a string using parse or value (KDB / Q)

The following lines of Q code all throw an error, because when the statement "local" is parsed, the local variable is not in the correct scope.
{local:1; value "local"}[]
{[local]; value "local"}[1]
{local:1; eval parse "local"}[]
{[local]; eval parse "local"}[1]
Is there a way to reach the local variable from inside the parsed string?
Note: This is a simplification of the actual problem I'm grappling with, which is to write a function that executes a query, accepting a list of columns which it should return. I imagine the finished product looking something like this:
getData:{[requiredColumns, condition]
value "select ",(", " sv string[requiredColumns])," from myTable where someCol=condition"
}
The condition parameter in this query is the one that isn’t recognised and I do realise I could append it’s value rather than reference it inside a string, but the real query uses lots of local variables including tables etc, so it’s not as easy as just pulling all the variables out of the string before calling value on it.
I'm new to KDB and Q, so if anyone has a better way to achieve the same effect I'm happy to be schooled on the proper way to achieve this outcome in Q. Would still be interested to know in the variable access thing is possible though.
In the first example, you are right that local is not within the correct scope, as value is looking for the global variable local.
One way to get around this is to use a namespace, which will define the variable globally, but can only be accessed by calling that namespace. In the modified example below I have defined local in the .ns namespace
{.ns.local:1; value ".ns.local"}[]
For the problem you are facing with selecting, if requiredColumns is a symbol list of columns you can just use the take operator # to select them.
getData:{[requiredColumns] requiredColumns#myTable}
For more advanced queries using variables you may have to use functional select form, explained here. This will allow you to include variables in the where and by clause of the select statement
The same example in functional form would be (no by clause, only select and where):
getData:{[requiredColumns;condition] requiredColumns:(), requiredColumns;
?[myTable;enlist (=;`someCol;condition);0b;requiredColumns!requiredColumns]}
The first line ensures that requiredColumns is a list even if the user enters a single column name
value will look for a variable in the global scope that's why you are getting an error. You can directly use local variables like you are doing that in your function.
Your function is mostly correct, just need a slight correction to append condition(I have mentioned that below). However, a better approach would be to use functional select in this case.
Using functional select:
q) t:([]id:`a`b; val:3 4)
q) gd: {?[`t;enlist (=;`val;y);0b;((),x)!(),x]}
q) gd[`id;3] / for single column
Output:
id
-
1
q) gd[`id`val;3] / for multiple columns
In case your condition column is of type symbol, then enlist your condition value like:
q) gd: {?[`t;enlist (=;`id;y);0b;((),x)!(),x]}
q) gd[`id;enlist `a]
You can use parse to get a functional form of qsql queries:
q) parse " select id,val from t where id=`a"
?
`t
,,(=;`id;,`a)
0b
`id`val!`id`val
Using String concat(your function):
q)getData:{[requiredColumns;condition] value "select ",(", " sv string[requiredColumns])," from t where id=", .Q.s1 condition}
q) getData[enlist `id;`a] / for single column
q) getData[`id`val;`a] / for multi columns

kdb q - lookup in nested list

Is there a neat way of looking up the key of a dictionary by an atom value if that atom is inside a value list ?
Assumption: The value lists of the dictionary have each unique elements
Example:
d:`tech`fin!(`aapl`msft;`gs`jpm) / would like to get key `fin by looking up `jpm
d?`gs`jpm / returns `fin as expected
d?`jpm / this doesn't work unfortunately
$[`jpm in d`fin;`fin;`tech] / this is the only way I can come up with
The last option does not scale well with the number of keys
Thanks!
You can take advantage of how where operates with dictionaries, and use in :
where `jpm in/:d
,`fin
Note this will return a list, so you might need to do first on the output if you want to replicate what you have above.
Why are you making this difficult on yourself? Use a table!
q)t:([] c:`tech`tech`fin`fin; sym:`aapl`msfw`gs`jpm)
q)first exec c from t where sym=`jpm
You can of course do what you're asking:
first where `jpm in'd
but this doesn't extend well to vectors while the table-approach does!
q)exec c from t where sym in `jpm`gs
I think you can take advantage of the value & key keywords to find what you're after:
q)key[d]where any value[d]in `jpm
,`fin
Hope that helps!
Jemma
The answers you have received so far are excellent. Here's my contribution building on Ryan's answer:
{[val;dict]raze {where y in/:x}[dict]'[val]}[`msft`jpm`gs;d]
The main difference is that you can pass a list of values to be evaluated and the result will be a list of keys.
[`msft`jpm`gs;d]
Output:
`tech`fin`fin

Kdb+/q: Looping over initialised list

Let's say I create an empty table:
test:([] name:`symbol$(); balance:`int$());
Now let's populate this list with one row:
insert[`test;(`John;1001)];
Now if I want to loop over this table as follows:
n:0;
k:0;
f:{x%100}
do[count test; k+:f[test.balance[n]]; n+:1]
Then it gave me an error because it tried to use (evaluate) the empty initialisation value with the function f.
Is there any particular reason why this doesn't work?
And how can I make sure it does work?
What you're doing may work but it's far from best practice. Loops and indices are not the way to go.
What you're looking for is essentially
test:([] name:`symbol$(); balance:`int$());
insert[`test;(`John;1001)];
insert[`test;(`Jane;2002)];
q)select sum f[balance] from test
balance
-------
30.03

dataFrame keying using pandas groupby method

I new to pandas and trying to learn how to work with it. Im having a problem when trying to use an example I saw in one of wes videos and notebooks on my data. I have a csv file that looks like this:
filePath,vp,score
E:\Audio\7168965711_5601_4.wav,Cust_9709495726,-2
E:\Audio\7168965711_5601_4.wav,Cust_9708568031,-80
E:\Audio\7168965711_5601_4.wav,Cust_9702445777,-2
E:\Audio\7168965711_5601_4.wav,Cust_7023544759,-35
E:\Audio\7168965711_5601_4.wav,Cust_9702229339,-77
E:\Audio\7168965711_5601_4.wav,Cust_9513243289,25
E:\Audio\7168965711_5601_4.wav,Cust_2102513187,18
E:\Audio\7168965711_5601_4.wav,Cust_6625625104,-56
E:\Audio\7168965711_5601_4.wav,Cust_6073165338,-40
E:\Audio\7168965711_5601_4.wav,Cust_5105831247,-30
E:\Audio\7168965711_5601_4.wav,Cust_9513082770,-55
E:\Audio\7168965711_5601_4.wav,Cust_5753907026,-79
E:\Audio\7168965711_5601_4.wav,Cust_7403410322,11
E:\Audio\7168965711_5601_4.wav,Cust_4062144116,-70
I loading it to a data frame and the group it by "filePath" and "vp", the code is:
res = df.groupby(['filePath','vp']).size()
res.index
and the output is:
[E:\Audio\7168965711_5601_4.wav Cust_2102513187,
Cust_4062144116, Cust_5105831247,
Cust_5753907026, Cust_6073165338,
Cust_6625625104, Cust_7023544759,
Cust_7403410322, Cust_9513082770,
Cust_9513243289, Cust_9702229339,
Cust_9702445777, Cust_9708568031,
Cust_9709495726]
Now Im trying to approach the index like a dict, as i saw in examples, but when im doing
res['Cust_4062144116']
I get an error:
KeyError: 'Cust_4062144116'
I do succeed to get a result when im putting the filepath, but as i understand and saw in previouse examples i should be able to use the vp keys as well, isnt is so?
Sorry if its a trivial one, i just cant understand why it is working in one example but not in the other.
Rutger you are not correct. It is possible to "partial" index a multiIndex series. I simply did it the wrong way.
The index first level is the file name (e.g. E:\Audio\7168965711_5601_4.wav above) and the second level is vp. Meaning, for each file name i have multiple vps.
Now, this is correct:
res['E:\Audio\7168965711_5601_4.wav]
and will return:
Cust_2102513187 2
Cust_4062144116 8
....
but trying to index by the inner index (the Cust_ indexes) will fail.
You groupby two columns and therefore get a MultiIndex in return. This means you also have to slice using those to columns, not with a single index value.
Your .size() on the groupby object converts it into a Series. If you force it in a DataFrame you can use the .xs method to slice a single level:
res = pd.DataFrame(df.groupby(['filePath','vp']).size())
res.xs('Cust_4062144116', level=1)
That works. If you want to keep it as a series, boolean indexing can help, something like:
res[res.index.get_level_values(1) == 'Cust_4062144116']
The last option is a bit less readable, but sometimes also more flexibile, you could test for multiple values at once for example:
res[res.index.get_level_values(1).isin(['Cust_4062144116', 'Cust_6073165338'])]