read function name and args from a table row and iterate and execute it and store the output to a single table - kdb

I have a data.csv which looks like below having a function name and a dictionary.
function,args
fun1,(`startDate`endDate`sym`rollPerct`expDateThreshold`expDateThresholdExpiry)!(.z.D-5;.z.D;`AAPL;0.8;10;1)
fun2,(`startDate`endDate`sym`rollPerct`expDateThreshold`expDateThresholdExpiry)!(.z.D-5;.z.D;`MSFT`ZAK;0.8;10;1)
fun3,(`startDate`endDate`sym`rollPerct`expDateThreshold`expDateThresholdExpiry)!(.z.D-5;.z.D;`NAFK;0.8;10;1)
And If I read the data
tab:("S*";enlist ",") 0:`$data.csv
Now, I want to iterate all rows from the table like below and call them and save all 3 results to a single table res
fun1 [(`startDate`endDate`sym`rollPerct`expDateThreshold`expDateThresholdExpiry)!(.z.D-5;.z.D;`AAPL;0.8;10;1)]
fun2 [(`startDate`endDate`sym`rollPerct`expDateThreshold`expDateThresholdExpiry)!(.z.D-5;.z.D;`MSFT`ZAK;0.8;10;1)]
fun3 [(`startDate`endDate`sym`rollPerct`expDateThreshold`expDateThresholdExpiry)!(.z.D-5;.z.D;`NAFK;0.8;10;1)]
Code snippet to iterate over f1[args], f2[args] and f3[args]. Combine all 3 results into a single table. I had used loop here, but there should be something better than loop here? let me know if any?
cnt:(count table); //get count of table
ino:0; //initialize out counter to 0
tab::flip (`date`sym`ric!(`date$();`symbol$();`symbol$())); //create a global table so it can hold iteration data
//perform iteration where f1[args],f2[args],f3[args]=tab
while[ino<cnt;
data:exec .[first function;args] from table where i=ino;
upsert[`tab;data];
ino:ino+1
]
//tab now has all the itration data of f1 f2 f3
tab

if your inputs are correctly ordered for all functions, the following simple example should work
q)f1:{x+y+z+2};f2:{x*y*z*22};f3:{x%y%z%42};
q)tab:([]func:`f1`f2`f3;args:`x`y`z!/:3 cut til 9)
q)tab
func args
-----------------
f1 `x`y`z!0 1 2
f2 `x`y`z!3 4 5
f3 `x`y`z!6 7 8
q)update res:func .'get'[args]from tab
func args res
---------------------------
f1 `x`y`z!0 1 2 5
f2 `x`y`z!3 4 5 1320
f3 `x`y`z!6 7 8 0.1632653
NB: if you're loaded args are strings, you'll want to parse these
for example - taking the above again
q)tab:update .Q.s1'[args]from tab
q)tab
func args
-------------------
f1 "`x`y`z!0 1 2"
f2 "`x`y`z!3 4 5"
f3 "`x`y`z!6 7 8"
q)meta tab
c | t f a
----| -----
func| s
args| C
q)tab:update'[reval;parse]'[args]from tab
q)tab
func args
-----------------
f1 `x`y`z!0 1 2
f2 `x`y`z!3 4 5
f3 `x`y`z!6 7 8
q)meta tab
c | t f a
----| -----
func| s
args|
q)update res:func .'get'[args]from tab
func args res
---------------------------
f1 `x`y`z!0 1 2 5
f2 `x`y`z!3 4 5 1320
f3 `x`y`z!6 7 8 0.1632653
reval in the above will try to stop anything dodgy being ran but i would avoid parsing code straight from files where possible

Related

KDB Table enlist function call not running

I have a simple problem below
f2:{[x;y]
r:sum(x)*sum(y);
r
};
tm:([] pr:(100.01 100.02;100.03 100.04); rv:(15.72 55.64; 16.92 15.17 12.21 34.99))
f2 each [tm`rv][tm`pr]
The result I get is
{[x;y]
r:sum(x)*sum(y);
r
}[(15.72 55.64;16.92 15.17 12.21 34.99)'[(100.01 100.02;100.03 100.04)]]
The result I want is to add tm`rv and add tm`pr and multiply.
tm
pr rv
-------------------------------------
100.01 100.02 15.72 55.64
100.03 100.04 16.92 15.17 12.21 34.99
Hi, you can sum each nested list then do the multiplication:
select result:(sum each pr)*sum each rv from tm
result
--------
14274.14
15863.55
q)
But if you want to use your f2 function:{[x;y] r:sum(x)*sum(y); r }
You should do this:
f2'[tm`rv;tm`pr]
14274.14 15863.55
q)
' apply f2 over pairwise combinations of arguments

CLLE SNDRCVF command not allowed

I am trying to compile this piece of CL code using Rational Series but keep getting error.
This is my CL code:
PGM
DCLF FILE(LAB4DF)
SNDRCVF RCDFMT(RECORD1) /* send, recieve file */
DOWHILE (&IN03 = '0')
SELECT
WHEN (&USERINPUT = '1' *OR &USERINPUT = '01') CALLSUBR OPTION1
OTHERWISE DO
*IN03 = '1'
ENDDO
ENDSELECT
ENDDO
SUBR OPTION1
DSPLIBL
ENDSUBR
ENDPGM
And this is my DSPF code
A R RECORD1
A 1 38'LAB 4'
A 3 3'Please select one of the following-
A options:'
A 6 11'3. Maximum Invalid Signon Attempt-
A s allowed'
A 8 11'5. Run Instructor''s Insurance Pr-
A ogram'
A 5 11'2. Signed on User''s Message Queu-
A e'
A 1 3'Yathavan Parameshwaran'
A 7 11'4. Initial number of active jobs -
A for storage allocation'
A 4 11'1. Previous sign on by signed on -
A user'
A 14 11'F3 = Exit'
A 14 31'F21 = Command Line'
A 2 70TIME
A 1 72DATE
A 9 11'Option: '
A USERINPUT 2 B 9 19
A 91 DSPATR(RI)
A 92 DSPATR(PC)
A MSGTXT1 70 O 11 11
A MSGTXT2 70 O 12 11
Is there a problem with my CL code or DSPF code?
You forgot to say what error you were getting. It's always important to put all the information about error messages into your questions.
There are two errors.
&IN03 is not defined
Your assignment to *IN03 should be to &IN03, but that's not how you do an assignment in CLP
If you want to be able to press F3, you have to code something like CA03(03) in the "Functions" for the record format.
To assign a variable in CL, code
CHGVAR name value
Looking at the documentation here, I suspect you need to add RCDFMT to your DCLF spec like so:
DCLF FILE(LAB4DF) RCDFMT(RECORD1)
SNDRCVF RCDFMT(RECORD1) /* send, recieve file */
If you really do only have 1 record format in your display file, then you can also omit the RCDFMT from both commands like so:
DCLF FILE(LAB4DF)
SNDRCVF /* send, recieve file */

How to traverse M*N grid in KDB

How to traverse m*n grid in Qlang, you can traverse up , down or diagonally.
to find how many possible ways end point can be reached.
Like Below :
0
|
------- ------
| | |
( 0 1) (1 1) (1 0)
| . |
------ ----- ------ -----
| | . | |
( 0 1) (1 0) ( 1 1) (2 0)
....
(2 2) ..................... (2 2)
One way of doing it using .z.s to recursively call the initial function with different arguments and summing to give total number of paths.
f:{
// When you reach a wall, there is only one way to corner so return valid path
if[any 1=(x;y);:1];
// Otherwise spawn 3 paths - one up, one right and one diagonally
:.z.s[x-1;y] + .z.s[x;y-1] + .z.s[x-1;y-1]
}
q)f[2;2]
3
q)f[2;3]
5
q)f[3;3]
13
If you are travelling along the edges and not the squares you can change the first line to:
if[any 0=(x;y);:1];
A closed form solution is just finding the Delannoy Number, which could be implemented something like this when you are travelling along edges.
d:{
k:1+min(x;y);
f:{prd 1+til x};
comb:{[f;m;n] f[m] div f[n]*f[m-n]}[f];
(sum/) (2 xexp til k) * prd (x;y) comb/:\: til k
}
q)d[3;3]
63f
This is much quicker for larger boards as I think the complexity of the first solution is O(3^m+n) while the complexity of the second is O(m*n)
q)\t f[7;7]
13
q)\t f[10;10]
1924
q)\t d[7;7]
0
q)\t d[100;100]
1

An issue with argument "sortv" of function seqIplot()

I'm trying to plot individual sequences by means of function seqIplot() in TraMineR. These individual sequences represent work trajectories, completed by former school's graduates via a WEB questionnaire.
Using argument "sortv", I'd like to sort my sequences according to the order of the levels of one covariate, the year of graduation, named "PROMO".
"PROMO" is a factor variable contained in a data frame named "covariates.seq", gathering covariates together:
str(covariates.seq)
'data.frame': 733 obs. of 6 variables:
$ ID_SQ : Factor w/ 733 levels "1","2","3","5",..: 1 2 3 4 5 6
7 8 9 10 ...
$ SEXE : Factor w/ 2 levels "Féminin","Masculin": 1 1 1 1 2 1
1 2 2 1 ...
$ PROMO : Factor w/ 6 levels "1997","1998",..: 1 2 2 4 4 3 2 2
2 2 ...
$ DEPARTEMENT : Factor w/ 10 levels "BC","GCU","GE",..: 1 4 7 8 7 9
9 7 7 4 ...
$ NIVEAU_ADMISSION: Factor w/ 2 levels "En Premier Cycle",..: NA 1 1 1 1
1 NA 1 1 1 ...
$ FILIERE_SECTION : Factor w/ 4 levels "Cursus Classique",..: NA 4 2 NA
1 1 NA NA 4 3 ..
I'm also using "SEXE", the graduates' gender, as a grouping variable. To plot the individual sequences so, my command is as follows:
seqIplot(sequences, group = covariates.seq$SEXE,
sortv = covariates.seq$PROMO,
cex.axis = 0.7, cex.legend = 0.7)
I expected that, by using a process time axis (with the year of graduation as sequence-dependent origin), sorting the sequences according to the order of the levels of "PROMO" would give a plot with groups of sequences from the longest (for the older graduates) to the shortest (for the younger graduates).
But I've got an issue: in the output plot, the sequences don't appear to be correctly sorted according to the levels of "PROMO". Indeed, by using "sortv = covariates.seq$PROMO" as in the command above, the plot doesn't show groups of sequences from the longest to the shortest, as expected. It looks like the plot obtained without using the argument "sortv" (see Figures below).
Without using argument "sortv"
Using "sortv = covariates.seq$PROMO"
Note that I have 733 individual sequences in my object "sequences", created as follows:
labs <- c("En poste","Au chômage (d'au moins 6 mois)", "Autre situation
(d'au moins 6 mois)","En poursuite d'études (thèse ou hors
thèse)", "En reprise d'études / formation (d'au moins 6 mois)")
codes <- c("En poste", "Au chômage", "Autre situation", "En poursuite
d'études", "En reprise d'études / formation")
sequences <- seqdef(situations, alphabet = labs, states = codes, left =
NA, right = "DEL", missing = NA,
cnames = as.character(seq(0,7400/365,1/365)),
xtstep = 365)
The values of the covariates are sorted in the same order as the individual sequences. The covariate "PROMO" doesn't contain any missing value.
Something's going wrong, but what?
Thank you in advance for your help,
Best,
Arnaud.
Using a factor as sortv argument in seqIplot works fine as illustrated by the example below:
sdc <- c("aabbccdd","bbbccc","aaaddd","abcabcab")
sd <- seqdecomp(sdc, sep="")
seq <- seqdef(sd)
fac <- factor(c("2000","2001","2001","2000"))
par(mfrow=c(1,3))
seqIplot(seq, with.legend=FALSE)
seqIplot(seq, sortv=fac, with.legend=FALSE)
seqlegend(seq)

Spark: All RDD data not getting saved to Cassandra table

Hi, I am trying to load RDD data to a Cassandra Column family using Scala. Out of a total 50 rows , only 28 are getting stored into cassandra table.
Below is the Code snippet:
val states = sc.textFile("state.txt")
//list o fall the 50 states of the USA
var n =0 // corrected to var
val statesRDD = states.map{a =>
n=n+1
(n, a)
}
scala> statesRDD.count
res2: Long = 50
cqlsh:brs> CREATE TABLE BRS.state(state_id int PRIMARY KEY, state_name text);
statesRDD.saveToCassandra("brs","state", SomeColumns("state_id","state_name"))
// this statement saves only 28 rows out of 50, not sure why!!!!
cqlsh:brs> select * from state;
state_id | state_name
----------+-------------
23 | Minnesota
5 | California
28 | Nevada
10 | Georgia
16 | Kansas
13 | Illinois
11 | Hawaii
1 | Alabama
19 | Maine
8 | Oklahoma
2 | Alaska
4 | New York
18 | Virginia
15 | Iowa
22 | Wyoming
27 | Nebraska
20 | Maryland
7 | Ohio
6 | Colorado
9 | Florida
14 | Indiana
26 | Montana
21 | Wisconsin
17 | Vermont
24 | Mississippi
25 | Missouri
12 | Idaho
3 | Arizona
(28 rows)
Can anyone please help me in finding where the issue is?
Edit:
I understood why only 28 rows are getting stored in Cassandra, it's because I have made the first column a PRIMARY KEY and It looks like in my code, n is incremented maximum to 28 and then it starts again with 1 till 22 (total 50).
val states = sc.textFile("states.txt")
var n =0
var statesRDD = states.map{a =>
n+=1
(n, a)
}
I tried making n an accumulator variable as well(viz. val n = sc.accumulator(0,"Counter")), but I don't see any differnce in the output.
scala> statesRDD.foreach(println)
[Stage 2:> (0 + 0) / 2]
(1,New Hampshire)
(2,New Jersey)
(3,New Mexico)
(4,New York)
(5,North Carolina)
(6,North Dakota)
(7,Ohio)
(8,Oklahoma)
(9,Oregon)
(10,Pennsylvania)
(11,Rhode Island)
(12,South Carolina)
(13,South Dakota)
(14,Tennessee)
(15,Texas)
(16,Utah)
(17,Vermont)
(18,Virginia)
(19,Washington)
(20,West Virginia)
(21,Wisconsin)
(22,Wyoming)
(1,Alabama)
(2,Alaska)
(3,Arizona)
(4,Arkansas)
(5,California)
(6,Colorado)
(7,Connecticut)
(8,Delaware)
(9,Florida)
(10,Georgia)
(11,Hawaii)
(12,Idaho)
(13,Illinois)
(14,Indiana)
(15,Iowa)
(16,Kansas)
(17,Kentucky)
(18,Louisiana)
(19,Maine)
(20,Maryland)
(21,Massachusetts)
(22,Michigan)
(23,Minnesota)
(24,Mississippi)
(25,Missouri)
(26,Montana)
(27,Nebraska)
(28,Nevada)
I am curious to know what is causing n to not getting updated after value 28? Also, what are the ways in which I can create a counter which I can use for creating RDD?
There are some misconceptions about distributed systems embedded inside your question. The real heart of this is "How do I have a counter in a distributed system?"
The short answer is you don't. For example what you've done in your code example originally is something like this.
Task One {
var x = 0
record 1: x = 1
record 2: x = 2
}
Task Two {
var x = 0
record 20: x = 1
record 21: x = 2
}
Each machine is independently creating a new x variable set at 0 which gets incremented within it's own context, independently over the other nodes.
For most use cases the "counter" question can be replaced with "How can I get a Unique Identifier per Record in a distributed system?"
For this most users end up using a UUID which can be generated on independent machines with infinitesimal chances of conflicts.
If the question can be "How can I get a monotonic increasing unique indentifier?"
Then you can use zipWithUniqueIndex which will not count but will generate monotonically increasing ids.
If you just want them number to start with it's best to do it on the local system.
Edit; Why can't I use an accumulator?
Accumulators store their state (surprise) per task. You can see this with a little example:
val x = sc.accumulator(0, "x")
sc.parallelize(1 to 50).foreachPartition{ it => it.foreach(y => x+= 1); println(x)}
/*
6
7
6
6
6
6
6
7
*/
x.value
// res38: Int = 50
The accumulators combine their state after finishing their tasks, which means you can't use them as a global distributed counter.