Functional q-SQL: How to translate the fby part of the parse tree - kdb

When I run parse"select from trade where date=max date,price=(max;price) fby sym"
I get the following parse tree:
</br>
?
`trade
,((=;`date;(max;`date));(=;`price;(k){$[(#x 1)=#y;#[(#y)#x[0]0#x 1;g;:;x[0]'x..
0b
()
I tried interpreting this in functional form as:
?[trade;((=;`date;(max;`date));(=;`price;(k){$[(#x 1)=#y;#[(#y)#x[0]0#x 1;g;:;x[0]'x..;0b;()]
but I get an error pointing to the final. What is wrong with my syntax?

You have two issues here. First is that your display window is not wide enough to show the full output. To adjust this window use \c (console size) command:
q)\c 25 200
q)parse"select from trade where date=max date,price=(max;price) fby sym"
?
`trade
,((=;`date;(max;`date));(=;`price;(k){$[(#x 1)=#y;#[(#y)#x[0]0#x 1;g;:;x[0]'x[1]g:.=y];'length]};(enlist;max;`price);`sym)))
0b
()
However, you will still have an issue using this because parse displays the full underlying k definition of fby:
q)fby
k){$[(#x 1)=#y;#[(#y)#x[0]0#x 1;g;:;x[0]'x[1]g:.=y];'length]}
Note that the definition is prefixed with "k)", which causes the error. To get around this you'll want to entirely replace the the k definition above with fby so that you have
?[trade;((=;`date;(max;`date));(=;`price;(fby;(enlist;max;`price);`sym)));0b;()]
rather than
?[trade;((=;`date;(max;`date));(=;`price;(k){$[(#x 1)=#y;#[(#y)#x[0]0#x 1;g;:;x[0]'x[1]g:.=y];'length]};(enlist;max;`price);`sym)));0b;()]

You can use fby rather than the k definition from parse:
?[trade;(((=;`date;(max;`date));(=;`price;(fby;(enlist;max;`price);`sym))));0b;()!()]

As suggested by Aaron Davies in a KxCon talk a few years ago, you could use wrapper functions to make the code a bit easier to read:
q)trade:([]date:(.z.D-1),3#.z.D;sym:(3#`AA),`BB;price:1+til 4);
q)w:{(parse"select from t where ",x). 2 0}
q)?[trade;w"date=max date,price=(max;price)fby sym";0b;()]
date sym price
--------------------
2020.09.24 AA 3
2020.09.24 BB 4
This "w" won't cover all possible use-cases but it can be expanded to cover the more general cases.

I think your second parameter is getting cut off in your output. Notice how it ends with an ellipsis '..'. When you copy and paste it, you're only picking up part of the code.
,((=;date;(max;date));(=;`price;(k){$[(#x 1)=#y;#[(#y)#x[0]0#x 1;g;:;x[0]'x.. <- ellipsis here

Related

Tags in vowpal wabbit

I am doing binary classification using vowpal-wabbit. A particular record(set of features) has 10 zeroes and 5 ones. So, I am creating two lines in vowpal-format
-1 10 `50 |f f1
1 5 `50 |f f1
Since the prediction(probability) for both these records would be same, I want to keep the same tag, so that I can dedupe the predictions({tag,prediction}) later and join with my original raw-data.
Is it possible to keep the same tag for more than one record in vowpal-wabbit?
First, the syntax above isn't correct
To be identified as such, tags should either:
Touch the | separator (no space between them) OR
The leading quote, needs to be a simple quote, not a backquote, by convention.
(or both).
Otherwise you get:
warning: `50 is not a good float, replacing with 0
warning: `50 is not a good float, replacing with 0
Which hints that vw interprets these "tags" as prediction-base.
For details, see Input format in the official documentation
Once the example is fixed to the correct syntax:
-1 10 '50|f f1
1 5 '50|f f1
Which runs fine, we can answer the question:
Is it possible to keep the same tag for more than one record in vowpal-wabbit?
Yes, you can. The tag is merely a simple way to connect input and output (when predictions are involved), there's no check for uniqueness anywhere. If you duplicate tags on input, you'll simply get the same duplicate tags on prediction output as well.
More notes:
Even if two examples are identical, you may get different predictions, if the model has changed somewhat in between them. Remember vw is an online learner, so the model can continuously change with each example unless you add the -t (test-only, don't learn) option.
Features whose value is zero are ignored, so you can drop them. The standard way in vw to say this is 'positive' and this is 'negative' is to use the values {+1, -1}. This is true for both labels and input features.

Tesseract ambiguity files work different for editing

I want to edit some text like Female and Male because When I test them I found them as FemaIe and MaIe (I mean with Capital I not small L (l) ). And I want to solve this issue using ambfile like;
v1
6_tab_F_e_m_a_I_e_tab_6_tab_F_e_m_a_l_e_tab_1
4_tab_M_a_I_e_tab_4_tab_M_a_l_e_tab_1
But When I retest my results , they were worse. I found the Female as F and Male as M.
What I am doing wrong ? To use amb file like that is a wrong idea?
The fields should be tab separated, according to Tesseract Training Wiki.

Write records from one PF to another without READ operation or DOW loop or move operation.

I know how to copy records from one pf to another by reading one file in dow loop and writing into another file like below. Files are PF1 and PF2 having record format rec1 and rec2 respectively where each file have only one field named fld1 and #fld1 respectively-
READ PF1
DOW not %eof(PF1) and not %error
eval fld1 = #fld1
write Rec2
READ PF1
ENDDO
As the comments in Buck's answer mention, your team mate is alluding to using the RPG cycle to process the file. The cycle is basically an implicit read loop of files declared as 'P'rimary.
http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_71/rzasc/sc09250726.htm%23wq121
Originally, even RPG IV programs included code to used as part of the cycle, such as automatically opening files, even if you didn't actually declare any input primary files. Now however, you can create "Linear Main" programs using the MAIN() h-spec and your program will be cycle free.
Using the cycle is frowned upon in modern RPG. Primarily because the implicit nature of what's going on makes it tricky to understand non-trivial code. Additionally, cycle code doesn't perform any better than non-cycle code; it's just less to write. The I/Os being done remain exactly the same.
Finally, again as mentioned in the comments. If you want to optimize performace, use SQL. The set based nature of SQL beats RPG's one row at a time. I haven't benchmarked it recently, but way back on v5r2 or so, copying 100 or more rows was faster with SQL than RPG.
For reference only, FWiW; i.e. not recommendations, just examples of what can be done, esp. in cases alluded but for which no specifics were given:
My team mate told me that he can write code for this problem only in 4 lines including declaration of both files in F-spec. He will also not use read, move or dow loop. I don't know how can he do this. That's why I am eager to know this.
The following source is an example Cycle-program; my FLD1 of REC1 had a 10-byte field but I described my output for 20-bytes, so to avoid failed compile per sev-20 RNF7501 "Length of data structure in Result-Field does not equal the record length of Factor 2.", I specified GENLVL(20) on the CRTBNDRPG:
FPF1 IP E DISK rename(rec1:rcd1)
FPF2 O F 20 DISK
DINOUT E DS EXTNAME(PF1)
C WRITE PF2 INOUT
I don't want to use CL program. I just want to do it with a single program either in RPG3 or RPG4
A similar RPG Cycle-program could perform effectively the same thing, similarly copying the data from PF1 to PF2 despite different column name and [thus inherently also] the different record format, using the CL command without a CL program and almost as few lines. The following example depends on the must-always-be-one-row table called QSQPTABL in QSYS2 that would typically be in the system Library List, and the second argument could reflect the actual length of the command string, but just as easily codes the max prototyped length per the Const definition assuring the blank-padding up to that length without actually having to count the [~53] bytes of the concatenated string expression:
FQSQPTABL IP E DISK rename(qsqptabl:qsqptable)
DQcmdExc PR ExtPgm('QSYS/QCMDEXC')
D 200A const
D 15P05 const
c callp QcmdExc('cpyf pf1 pf2 mbropt(*add)'
c +' fmtopt(*nochk) crtfile(*no)':200)
Whereas both of the above sources are probably an enigma to anyone unfamiliar with the Cycle, the overall effects of the latter are quite likely to be inferred correctly [¿perhaps more appropriately described as guessed correctly?], by just about anyone with an understanding of the CL command string, despite their lack of understanding of the Cycle.
And of course, as was also noted, with the SQL the program is probably arguably even easier\simpler; possibly even more readable to the uninitiated [although the WITH NONE clause, shown as WITH NC, added just in case the COMMIT(*NONE) was overlooked on the compile request, probably is not easily intuited]:
C/Exec SQL
C+ insert into pf2 select * from pf1 WITH NC
C/End-Exec
C SETON LR
P.S. The source-code from the OP was originally [at least was, prior to my comment added here] incorrectly coded with eval fld1 = #fld1 when surely what was intended was eval #fld1 = fld1 according to the setup\given.
If you need to use RPG, use embedded SQL. Look up INSERT INTO.
If you aren't limited to RPG, consider CPYF... MBROPT(*ADD).
What business problem are you trying to solve by doing it another way?

How to avoid a meta argument warning in SICStus SPIDER?

This is probably related to a comp.lang.prolog-discussion.
I'm getting several warnings like this using Eclipse with the SICStus SPIDER:
The plain meta argument (Y) is passed as a closure argument
(with 0 suppressed arguments) to the callee.
Here is a code sample:
% Prologs set_of is baroque %% RS-140614 130sec runtime vs. 28sec runtime
:- meta_predicate set_of(+,:,+) .
set_of(X,Y,Z):- %%
setof(X,Y^Y,Z),!; %% Trick to avoid alternatives
Z=[]. %% What is wrong with empty sets ?
How can I get rid of the SPIDER warnings?
I'm not really interested in simply suppressing the warnings.
I'm using the latest version of SPIDER IDE (0.0.51), and SICStus Prolog 4.2.3.
There are several issues in the code you show.
Bad meta argument
First, the built-in predicate setof/3 has the following properties:
?- predicate_property(setof(A,B,C),P).
P = (meta_predicate setof(?,0,?))
; P = built_in
; P = jittable.
which closely corresponds to the ISO declarations in ISO/IEC 13211-1:
8.10.3.2 Template and modes
setof(?term, +callable_term, ?list)
The second argument is a goal to be executed by call/1. No extra arguments are needed. This is what the 0 tells us.
On the other hand, your code you show contains a different meta predicate declaration:
:- meta_predicate(set_of(+,:,+)) .
Here, the second argument is a :. In SICStus, YAP, and SWI, the : means: This argument will be automatically qualified with the current module, such that the module information can be passed further on. Think of asserta(:). Here, the argument is not a goal but a clause.
So what you need to fix this, is to replace : by 0. And you might indicate this fact in the variable name used. That is, Goal_0 for call(Goal_0), Goal_1 for call(Goal_1, Arg1), Goal_2for call(Goal_2, Arg1, Arg2) etc.
Bad modes
The + in the first and third argument is inappropriate. The 3rd argument is commonly an uninstantiated variable to be unified with the resulting list.
Prolog's setof/3 baroque?
% Prologs set_of is baroque
The comment probably wants to say that setof/3 contains superfluous ornaments. In fact, setof/3 is much more versatile than mentioned set_of/3. Take this recent question or that. Often you first think about a very specific situation. Say, you want the list of actors of a particular movie. Then, later on you want to ask what movies there are. It is this generalization which works very smoothly with setof/3 whereas it is extremely complex if you do not have it.
Another very useful way to use setof/3 is when you want to eliminate redundant answers:
?- (X=2;X=1;X=2).
X = 2
; X = 1
; X = 2.
?- setof(t, (X=2;X=1;X=2), _).
X = 1
; X = 2.
Try to emulate that efficiently.
Runtime overheads
They are next to negligible. If you really believe that there are overheads, simply use setof/3 with a single goal. In this manner preprocessing is next to naught.

Parse bit strings in Perl

When working with unpack, I had hoped that b3 would return a bitstring, 3 bits in length.
The code that I had hoped to be writing (for parsing a websocket data packet) was:
my($FIN,$RSV1, $RSV2, $RSV3, $opcode, $MASK, $payload_length) = unpack('b1b1b1b1b4b1b7',substr($read_buffer,0,2));
I noticed that this doesn't do what I had hoped.
If I used b16 instead of the template above, I get the entire 2 bytes loaded into first variable as "1000000101100001".
That's great, and I have no problem with that.
I can use what I've got so far, by doing a bunch of substrings, but is there a better way of doing this? I was hoping there would be a way to process that bit string with a template similar to the one I attempted to make. Some sort of function where I can pass the specification for the packet on the right hand side, and a list of variables on the left?
Edit: I don't want to do this with a regex, since it will be in a very tight loop that will occur a lot.
Edit2: Ideally it would be nice to be able to specify what the bit string should be evaluated as (Boolean, integer, etc).
If I have understood correctly, your goal is to split the 2-bytes input to 7 new variables.
For this purpose you can use bitwise operations. This is an example of how to get your $opcode value:
my $b4 = $read_buffer & 0x0f00; # your mask to filter 9-12 bits
$opcode = $b4 >> 8; # rshift your bits
You can do the same manipulations (maybe in a single statement, if you want) for all your variables and it should execute at a resonable good speed.