Applying dictionary to dictionary - kdb

Recently I've found a technique of applying dict to dict. It is something like this:
(3 4 5!6 7 8)[(`a`b)!(2 3)] ~ (`a`b!0N 6)
or even this, which looks more natural for reading from left to right:
(#[;(`a`b)!(2 3)](3 4 5!6 7 8)) ~ (`a`b!0N 6)
Can I use this behavior (are there any caveats)? Is it described somewhere in the official docs?
It seems like absolutely mind-blowing technique: we definitely apply function(list/dictionary/table) to it's argument, not just pass argument to a function.

I think it's reasonable if you view a dictionary as behaving like a function which takes keys as input and outputs values.
So in the same way that a function that doubles would double the values of your input while retaining the keys of your input:
q)f1:2*
q)f1[`a`b!2 3]
a| 4
b| 6
a "function" (dictionary) that takes keys and outputs values would do similar:
q)f2:3 4 5!6 7 8
q)f2[`a`b!2 3]
a|
b| 6

Related

SPSS/macro: split string into multiple variables

I am trying to split a string variable into multiple dummy coded variables. I used these sources to get an idea of how one would achieve this task in SPSS:
https://www.ibm.com/support/pages/making-multiple-string-variables-single-multiply-coded-field
https://www.spss-tutorials.com/spss-split-string-variable-into-separate-variables/
But when I try to adapt the first one to my needs or when I try to convert the second one to a macro, I fail.
In my dataset I have (multiple) variables that contain a comma seperated string that represents different combinations of selected items (as well as missing values). For each item of a specific variable I want to create a dummy variable. If the item was selected, it should be represented with a 1 in the new dummy variable. If it was not selected, that case should be represented with a 0.
Different input variables can contain different numbers of items.
For example:
ID
VAR1
VAR2
DMMY1_1
DMMY1_2
DMMY1_3
1
1, 2
8
1
1
0
2
1
1, 3
1
0
0
3
3, 1
2, 3, 1
1
0
1
4
2, 8
0
0
0
Here is what I came up with so far ...
* DEFINE DATA.
DATA LIST /ID 1 (F) VAR1 2-5 (A) VAR2 6-12 (A).
BEGIN DATA
11, 28
21 1, 3
33, 12, 3, 1
4 2, 8
END DATA.
* MACRO SYNTAX.
* DEFINE VARIABLES (in the long run these should/will be inside the macro function, but for now I will leave them outside).
NUMERIC v1 TO v3 (F1).
VECTOR v = v1 TO v3.
STRING #char (A1).
DEFINE split_var(vr = TOKENS(1)).
!DO !#pos=1 !TO char.length(!vr).
COMPUTE #char = char.substr(!vr, !#pos, 1).
!IF (!#char !NE "," !AND !#char !NE " ") !THEN
COMPUTE v(NUMBER(!#char, F1)) = 1.
!IFEND.
!DOEND.
!ENDDEFINE.
split_var vr=VAR1.
EXECUTE.
As I got more errors than I can count, it's hard to narrow down my problem. But I think the problem has something to do with the way I use the char.length() function (and I am a bit confused when to use the bang operator).
If anyone has some insights, I would really appreciate some help :)
There is a fundamental issue to understand about SPSS macro - the macro does not read or interact in any way with the data. All the macro does is manipulate text to write syntax. The syntax created will later work on the actual data when you run it.
So, for example, Your first error is using char.length(!vr) within the syntax. You are trying to get the macro to read the data, calculate the length and use, but that simply can't be done - the macro can only work with what you gave it.
Another example in your code: you calculate #char and then try to use it in the macro as !#char. So that obviously won't work. ! precedes only macro functions or arguments. #char, in your code, is neither, and it can't become one - can't read the data into the macro...
To give you a litte push forward: I understand you want the macro loop to run a different number of times for each variable, but you can't use char.length(!vr). I suggest instead have the macro loop as many times as necessary to be sure you can deal with the longest variable you'll need to work with.
And another general strategy hint - first, create syntax to deal with one specific variable and one specific delimiter. Once this works, start working on a macro, keeping in mind that the only purpose of the macro is to recreate the same working syntax, only changing the parameters of variable name and delimiter.
With my new understanding of the SPSS macro logic (thanks to #eli-k) the problem was quite easy to solve. Here is the working solution.
* DEFINE DATA.
DATA LIST /ID 1 (F) VAR1 2-5 (A) VAR2 6-12 (A).
BEGIN DATA
11, 28
21 1, 3
33, 12, 3, 1
4 2, 8
END DATA.
* DEFINE MACRO.
DEFINE #split_var(src_var = !TOKENS(1)
/dmmy_var_label = !DEFAULT(dmmy) !TOKENS(1)
/dmmy_var_lvls = !TOKENS(1))
NUMERIC !CONCAT(!dmmy_var_label,1) TO !CONCAT(!dmmy_var_label, !dmmy_var_lvls) (F1).
VECTOR #dmmy_vec = !CONCAT(!dmmy_var_label,1) TO !CONCAT(!dmmy_var_label, !dmmy_var_lvls).
STRING #char (A1).
LOOP #pos=1 TO char.length(!src_var).
COMPUTE #char = char.substr(!src_var, #pos, 1).
DO IF (#char NE "," AND #char NE " ").
COMPUTE #index = NUMBER(#char, F1).
COMPUTE #dmmy_vec(#index) = 1.
END IF.
END LOOP.
RECODE !CONCAT(!dmmy_var_label,1) TO !CONCAT(!dmmy_var_label, !dmmy_var_lvls) (SYSMIS=0) (ELSE=COPY).
EXECUTE.
!ENDDEFINE.
* CALL MACRO.
#split_var src_var=VAR2 dmmy_var_lvls=8.

Iterate (/) a multivalent function

How do you iterate a function of multivalent rank (>1), e.g. f:{[x;y] ...} where the function inputs in the next iteration step depend on the last iteration step? Examples in the reference manual only iterate unary functions.
I was able to achieve this indirectly (and verbosely) by passing a dictionary of arguments (state) into unary function:
f:{[arg] key[arg]!(min arg;arg[`y]-2)}
f/[{0<x`x};`x`y!6 3]
Note that projection, e.g. f[x;]/[whilecond;y] would only work in the scenario where the x in the next iteration step does not depend on the result of the last iteration (i.e. when x is path-independent).
In relation to Rahul's answer, you could use one of the following (slightly less verbose) methods to achieve the same result:
q)g:{(min x,y;y-2)}
q)(g .)/[{0<x 0};6 3]
-1 -3
q).[g]/[{0<x 0};6 3]
-1 -3
Alternatively, you could use the .z.s self function, which recursively calls the function g and takes the output of the last iteration as its arguments. For example,
q)g:{[x;y] x: min x,y; y:y-2; $[x<0; (x;y); .z.s[x;y]]}
q)g[6;3]
-1 -3
Function that is used with '/' and '\' can only accept result from last iteration as a single item which means only 1 function parameter is reserved for the result. It is unary in that sense.
For function whose multiple input parameters depends on last iteration result, one solution is to wrap that function inside a unary function and use apply operator to execute that function on the last iteration result.
Ex:
q) g:{(min x,y;y-2)} / function with rank 2
q) f:{x . y}[g;] / function g wrapped inside unary function to iterate
q) f/[{0<x 0};6 3]
Over time I stumbled upon even shorter way which does not require parentheses or brackets:
q)g:{(min x,y;y-2)}
q){0<x 0} g//6 3
-1 -3
Why does double over (//) work ? The / adverb can sometimes be used in place of the . (apply) operator:
q)(*) . 2 3
6
q)(*/) 2 3
6

horner's method of hashing

I know how to get the value of hashing a string by horner's method wich takes a three paramettres String str , int p (prime) and int m like this
p(str)=( sumOf(str(0)+str(1)*M+....+str(n)*M^n) )%p = hashVal
but the problem is how to get the string str by giving just hashVal , p and M
for example if I give you hashval=7 , p = 11 and M = 2 you have to give me a string for example "hello" (not right just a suggestion for understanding)
I mean that I don't know how to do the inverse
and thanks for your help
You can't get a unique input from the hash you describe, since the modulo operation throws information away. If you knew the hashval was 7, M=2 and p=11, as in your example, you wouldn't know whether the sumOf(...) was 7, or 18, or so on.
Even if you did, say you knew it was 5, for an example 2-character string you wouldn't be able to work out whether str(0) was 1 and str(1) was 2, for example, or str(0) was 5 and str(1) was 0.
Hashes are usually hard/impossible to reverse, especially uniquely. The simplest way to solve them is to hash all possible inputs and check their outputs. You'll end up with many inputs with the same hashVal (if p in your example is 3 there are only 3 different hashes).

KDB+/Q: About unused parameters in inner functions

I have this function f
f:{{z+x*y}[x]/[y]}
I am able to call f without a 3rd parameter and I get that, but how is the inner {z+x*y} able to complete without a third parameter?
kdb will assume, if given a single list to a function which takes two parameters, that you want the first one to be x and the remainder to be y (within the context of over and scan, not in general). For example:
q){x+y}/[1;2 3 4]
10
can also be achieved by:
q){x+y}/[1 2 3 4]
10
This is likely what's happening in your example.
EDIT:
In particular, you would use this function like
q){{z+x*y}[x]/[y]}[2;3 4 5 6]
56
which is equivalent to (due to the projection of x):
q){y+2*x}/[3 4 5 6]
56
which is equivalent to (due to my original point above):
q){y+2*x}/[3;4 5 6]
56
Which explains why the "third" parameter wasn't needed
You need to understand 2 things: 'over' behavior with dyadic functions and projection.
1. Understand how over/scan works on dyadic function:
http://code.kx.com/q/ref/adverbs/#over
If you have a list like (x1,x2,x3) and funtion 'f' then
f/(x1,x2,x3) ~ f[ f[x1;x2];x3]
So in every iteration it takes one element from list which will be 'y' and result from last iteration will be 'x'. Except in first iteration where first element will be 'x' and second 'y'.
Ex:
q) f:{x*y} / call with -> f/ (4 5 6)
first iteration : x=4, y=5, result=20
second iteration: x=20, y=6, result=120
2. Projection:
Lets take an example funtion f3 which takes 3 parameters:
q) f3:{[a;b;c] a+b+c}
now we can project it to f2 by fixing (passing) one parameter
q) f2:f3[4] / which means=> f2={[b;c] 4+b+c}
so f2 is dyadic now- it accepts only 2 parameters.
So now coming to your example and applying above 2 concepts, inner function will eventually become dyadic because of projection and then finally 'over' function works on this new dyadic function.
We can rewrite the function as :
f:{
f3:{z+x*y};
f2:f3[x];
f2/y
}

Shorthand arithmetic operators return different results in Scala - EG: 3 + 2 != 3.+(2) [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Scala operator oddity
I'm very new to Scala and I read that in this language everything is an Object, cool. Also, if a method has only 1 argument, then we can omit the '.' and the parentesis '( )', that's ok.
So, if we take the following Scala example: 3 + 2, '3' and '2' are two Int Objects and '+' is a method, sounds ok. Then 3 + 2 is just the shorthand for 3.+(2), this looks very weird but I still get it.
Now let's compare the following blocks on code on the REPL:
scala> 3 + 2
res0: Int = 5
scala> 3.+(2)
res1: Double = 5.0
What's going on here? Why does the explicit syntax return a Double while the shorthand returns an Int??
3. is a Double. The lexer gets there first. Try (3).+(2).