Sorting KDB table while excluding Total row - kdb

I have noticed when using xasc capital letters take precedence over lower case.
Trying to exclude Total from being considered when doing the sort, wanting to avoid using "lower" then recapitalizing it again. I have my solution below but its rather poor code
t:flip (`active`price`price2)!(`def`abc`xyz`hij`Total;12j, 44j, 468j, 26j, 550j;49j, 83j, 716j, 25j, 873j)
Thinking there's a better way than this
(`active xasc select from t where not active=`Total),select from t where active=`Total

Although it does not match the sort order of your example answer, if you're looking to sort by true lexicographical order excluding captials you could do the following:
q)t:([]active:`def`abc`xyz`hij`Total;price:12 44 468 26 550;price2:49 83 716 25 873)
q)t iasc lower t`active
active price price2
-------------------
abc 44 83
def 12 49
hij 26 25
Total 550 873
xyz 468 716
Otherwise, if you're looking to have the Total row at the bottom following the sort then you will need to append it after doing so - given your example table:
q)(select[<active]from t where active<>`Total),select from t where active=`Total
active price price2
-------------------
abc 44 83
def 12 49
hij 26 25
xyz 468 716
Total 550 873

There isn't really a much cleaner way to do it, but this approach ensures Total is at the bottom without needing two selects (but it needs a group and a sort)
q)raze`active xasc/:t group`Total=t`active
active price price2
-------------------
abc 44 83
def 12 49
hij 550 873
xyz 26 25
Total 468 716

Matthew's is probably the best all-round solution.
If you know Total is always going to end up first after the sort then:
{1_x,1#x}`active xasc t // sort, join the first row to the end, drop first row
is a pretty concise solution - this is obviously not ideal if you don't have control over the active column contents as other uppercase entries would make this unpredictable.

Related

How serializing foreign keyed table works internally in kdb

I have a keyed table(referenced table) linked using foreign key to the referencing table and I serialize both tables using set operator.
q)kt:([sym:`GOOG`AMZN`FB]; px:20 30 40);
q)`:/Users/uts/db/kt set kt
q)t:([] sym:`kt$5?`GOOG`AMZN`FB; vol:5?10000)
q)`:/Users/uts/db/t set t
Then I remove these tables from the memory
q)delete kt,t from `.
Now I deserialize the table t in memory:
t:get `:/Users/uts/db/t
If I do meta t after this it fails, expecting kt as foreign key.
If I print t, as expected it shows index values in column sym of table t.
So, the question arises -
As kdb stores the meta of each table(i.e c,t,f,a) and its corresponding values on disk, how does table t serialization works internally?
How(In which form in binary format) are these values stored in file t.
-rw-r--r-- 1 uts staff 100 Apr 13 23:09 t
tl;dr A foreign key is stored as a vector of 4-byte indices of a key column of a referenced table plus a name of a table a foreign key refers to.
As far as I know kx never documented their file formats, and yet I think some useful information relevant to your question can be deduced right from a q console session.
Let me modify your example a bit to make things simpler.
q)show kt:([sym:`GOOG`AMZN`FB]; px:20 30 40)
sym | px
----| --
GOOG| 20
AMZN| 30
FB | 40
q)show t:([] sym:`kt$`GOOG`GOOG`AMZN`FB`FB)
sym
----
GOOG
GOOG
AMZN
FB
FB
I left only one column - sym - in t because vol is not relevant to the question. Let's save t without any data first:
q)`:/tmp/t set 0#t
`:/tmp/t
q)hcount `:/tmp/t
30
Now we know that it takes 30 bytes to represent t when it's empty. Let's see if there's a pattern when we start adding rows to t:
q){`:/tmp/t set x#t;`cnt`size!(x;hcount[`:/tmp/t] - 30)} each til[11], 100 1000 1000000
cnt size
---------------
0 0
1 4
2 8
3 12
4 16
5 20
6 24
7 28
8 32
9 36
10 40
100 400
1000 4000
1000000 4000000
We can see that adding one row increases the size of t by four bytes. What can these 4 bytes be? Can they be a representation of a symbol itself? No, because if they were and we renamed a sym value in kt it would affect the size of t on disk but it doesn't:
q)update sym:`$50#.Q.a from `kt where sym=`GOOG
`kt
q)1#t
sym
--------------------------------------------------
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
q)`:/tmp/t set 1#t
`:/tmp/t
q)hcount `:/tmp/t
34
Still 34 bytes. I think it should be obvious by now that the 4 bytes is an index, but an index of what? Is it an index of a column which must be called sym exactly? Apparently no, it isn't.
q)kt:`foo xcol kt
q)t
sym
--------------------------------------------------
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
AMZN
FB
FB
There's no column called sym in kt any longer but t hasn't changed at all! We can go even further and change the type of foo (ex sym) in kt:
q)update foo:-1 -2 -3.0 from `kt
`kt
q)t
sym
---
-1
-1
-2
-3
-3
Not only did it change t, it changed its meta too:
q)meta t
c | t f a
---| ------
sym| f kt
q)/ ^------- used to be s
I hope it's clear now that kdb stores a 4-byte index of a key column of a referenced table and a name of a table (but not a key column name!). If a referenced table is missing kdb can't reconstruct the original data and displays the bare index. It a referencing table needs to be sent over the wire then indices are replaced with actual values so that the receiving side can see the real data.

How to solve the below scenario using transformer loop or anything in datastage

My data is like below in one column coming from a file.
Source_data---(This is column name)
CUSTOMER 15
METER 8
METERStatement 1
READING 1
METER 56
Meterstatement 14
Reading 5
Reading 6
Reading 7
CUSTOMER 38
METER 24
METERStatement 1
READING 51
CUSTOMER 77
METER 38
READING 9
I want the output data to be like below in one column
CUSTOMER 15 METER 8 METERStatement 1 READING 1
CUSTOMER 15 METER 56 Meterstatement 14 Reading 5
CUSTOMER 15 METER 56 Meterstatement 14 Reading 6
CUSTOMER 15 METER 56 Meterstatement 14 Reading 7
CUSTOMER 38 METER 24 Meterstatement 1 Reading 51
CUSTOMER 77 METER 38 'pad 100 spaces' Reading 9
I am trying to solve by reading transformer looping documentation but could not figure out an actual solution. anything helps. thank you all.
Yes this could be solved within a transformer stage.
Concatenation is done with ":".
So use a stage variable to concat the input until a new "Meter" or "Customer" row comes up.
Save the "Customer" in a second stage variable in case it does not change.
Use a condition to only output thew rows where a "Reading" exists.
Reset the concatenated string when a "Reading" has been processed.
I guess you want the padding for missing fields in general - you could do these checks in separate stage variables. You have to store the previous item inorder to kow wat is missing - and maybe even more if two consecutive items could be missing.

SAS: Combining two data sets with different format

I have two datasets that are formatted differently
data1 looks like:
data1:
YYMM test1
1101 98
1102 98
1103 94
1104 92
1105 99
1106 91
data 2 is just a single grand mean that looks like:
data2:
GM
95
I would like to combine the two and have something that looks like this:
WANT:
YYMM test1 GM
1101 98 95
1102 98 95
1103 94 95
1104 92 95
1105 99 95
1106 91 95
I'm sure there are different ways to go about configuring this but I thought I should make the 95 into a column and merge with data1.
Do I have to use macro for this simple task? Please show me some light!
One straightforward way is to just merge without by statement and the use of retain:
data WANT (drop=temp);
merge DATA1 DATA2 (rename=(GM=temp));
retain GM;
if _N_=1 then GM=temp;
run;
So basically you put the two datasets together.
Because there is no by-statement, it will join together the first record of both datasets, the second record of both datasets and so on.
At the first record (if N=1), you grab the average and you put it in a variable for which the last value will be remembered (retain GM).
So in record 2, 3 etc, the value will still be what you put into it at record 1.
To keep it all clean, i renamed your GM variable on the input, so it was available to use as name for the retained variable. And of course, i dropped the redundant variable.
You can also approach this issue with a macro variable or a proc sql. But better keep it simple.
Here's a similar way that's slightly simpler.
data want;
set data1;
if _n_=1 then set data2;
run;

DB2 9 Fundamentals

Given the following two tables:
NAMES
NAME NUMBER
---------- -------
Wayne Gretzky 99
Jaromir Jagr 68
Bobby Orr 4
Bobby Hull 23
Mario Lemieux 66
POINTS
-----------------------------
NAME POINTS
---------- ------
Wayne Gretzky 244
Bobby Orr 129
Brett Hull 121
Mario Lemieux 189
Joe Sakic 94
How many rows would be returned using the following statement?
SELECT name FROM names, points
Can someone explain why the answer is 25?
Thanks in advance for any help provided
I guess this instruction is equivalent to a cross join in standard SQL. Hence the number of records returned is 5 records in names * 5 records in points = 25.
Also known as the "Cartesian Product"
"The Cartesian product, also referred to as a cross-join, returns all the rows in all the tables listed in the query. Each row in the first table is paired with all the rows in the second table. This happens when there is no relationship defined between the two tables."
from:
http://www.dba-oracle.com/t_garmany_9_sql_cross_join.htm

filemaker relationships not displaying

I import races into Excel but it has grown to a large spreadsheet and has its limitations.
I have successfully imported a small test database of results.
So far I have the form in the database with the tables and relationships below but when I try to make a layout to view the form I get the horse name and 1 line of form, when I scroll down the same horse displays again with it's next run.
I think it's because I have failed to fill the foreign keys in the horse_Race table, or got the relationships wrong.
I also want to add a Today's runners table but am not sure how to relate it to the existing tables Is it possible to achieve these aims in filemaker or am I barking up the wrong tree. I am at an Impasse but I'm sure it's to do with the relationships somewhere?
Tables as follows:
> -Course:-
pk_Course_ID, Course,
Horse:-
pk_Horse_ID,Horse
Races:-
pk_Race_ID,Course,Rdate,Rtime,Going,Age,Furs,Class,Ran,
- ***Horse_Race;-
pk_Run_ID,fk_Course_ID,fk_Horse_ID,fk_Races_ID,Course,RDate,Rtime,Going,Age,Furs,Class,Ran,Pos,Drw,TBtn,Horse,Wgt,MARK,GRD,WA,AA,BHB,BHBAdj,RATING,PPL
Relationships from primary key in each table to foreign keys in Horse_race table.
My aims are as follows.
To view EACH individual horse and its FORM in date order latest run at the top
AJCook (IRE)
DATE CRSE Going Furs Class Ran Pos Drw TBtn Wgt
MARK GRD RATING
31-Jul-13 REDC GD 6 6 11 11 1 20.8 133 65 63 -1
08-Jul-13 RIPO GF 6 6 11 7 3 8.25 133 65 65 41
21-Jun-13 REDC GF 5 5 5 1 4 0.02 133 60 56 54
28-May-13 REDC GF 6 5 13 5 6 5.35 124 61 70 35
06-May-13 BEVE GF 5 5 12 8 13 6.15 125 65 73 40
To add a todays runners table with races and runners from each of the days races that would loop through each horse and search the database to display the horses and their last 3 ratings latest on the right plus the TOP 9 RATINGS FROM THE LAST 3 RATINGS IN ORDER like so:-
HORSE R1 R2 R3 HORSE RATE
A J Cook (IRE) 54 41 -1 Abadejo 57
Aaranyow (IRE) 45 36 48 Abadejo 56
Aarti (IRE) 44 43 40 A J Cook (IRE) 54
Aazif (IRE) 46 43 23 Abadejo 54
Abadejo 56 54 57 Aaranyow (IRE) 48
How do I add the todays runners table which has the following data
Date Time,Course,Furs,HorseNo Horse
How will it be related to the tables I already have? Many thanks
Davey H
FileMaker can easily do this. But I am a bit confused with your blocks of text up there, can you format them a little better so that I can see what tables hold which fields please?