Count all tables in one instance in kdb - kdb

I would like to count all tables in the same instance.
I have not used kdb for a while and I forgot how to make this work.
This is what I got:
tablelist:tables[]
{select count i from x} each tablelist
but I got a type error

Your statement doesn't contain a trailing semi colon ; at the end of the first line which will cause an error in an IDE like qpad (assuming you are running it as written).
If not running from an IDE I would check my hdb for any possible missing data and run some sanity checks (i.e can I select from each of my tables normally, do types match across partitions, i is a virtual column representing row count so issues with non-conforming types in your other columns is probably not a cause but investigating may yield the right answer)
One way to achieve what you're trying is (using dummy data):
q){flip select counts:count i,tab:1#x from x}each tablelist:tables[]
counts tab
-------------
5469 depth
3150 quotes
3005 trades
Here I select the count for each table, but also add on the name of the table, flip each result into a dictionary, which results in a list of dictionaries of conforming types and key names which is in fact a table, hence my result. In this way you have a nice way to track what you're actually counting.

Each select query you run is returning a table in the form:
x
-
3
It would be better to use exec as opposed to select to simply return the value of the count e.g:
q){exec count i from x} each tables[]
3 2
Your current method would be attempting to return a list of tables: e.g:
q){select count i from x} each tables[]
+(,`x)!,,3
+(,`x)!,,2
However, the type error makes me think there may be an issue with your tables as this should not error for in-memory tables.

Here's one way
count each `. tables[]

I am using 3.6 2018.05.17 and your expression worked for me. I then change the select to an exec to return just a list of counts.
q){exec count i from x} each tables[]

Below code helps us get the count of each table along with tablename.
q)flip (`table;`msgcount)! flip {x, count value x}#'tables[]
To get only the count and not the tablename along with it.
q){count value x}#'tables[]

Related

Postgres: Query Values in nested jsonb-structure with unknown keys

I am quite new in working with psql.
Goal is to get values from a nested jsonb-structure where the last key has so many different characteristics it is not possible to query them explicitely.
The jsonb-structure in any row is as follows:
TABLE_Products
{'products':[{'product1':['TYPE'], 'product2':['TYPE2','TYPE3'], 'productN':['TYPE_N']}]}
I want to get the values (TYPE1, etc.) assigned to each product-key (product1, etc.). The product-keys are the unknown, because of too many different names.
My work so far achieves to pull out a tuple for each key:value-pair on the last level. To illustrate this here you can see my code and the results from the previously described structure.
My Code:
select url, jsonb_each(pro)
from (
select id , jsonb_array_elements(data #> '{products}') as pro
from TABLE_Products
where data is not null
) z
My result:
("product2","[""TYPE2""]")
("product2","[""TYPE3""]")
My questions:
Is there a way to split this tuple on two columns?
Or how can I query the values kind of 'unsupervised', so without knowing the exact names of 'product1 ... n'

kdb: getting one row from HDB

For a normal table, we can select one row using select[1] from t. How can I do this for HDB?
I tried select[1] from t where date=2021.02.25 but it gives error
Not yet implemented: it probably makes sense, but it’s not defined nor implemented, and needs more thinking about as the language evolves
select[n] syntax works only if table is already loaded in memory.
The easiest way to get 1st row of HDB table is:
1#select from t where date=2021.02.25
select[n] will work if applied on already loaded data, e.g.
select[1] from select from t where date=2021.02.25
I've done this before for ad-hoc queries by using the virtual index i, which should avoid the cost of pulling all data into memory just to select a couple of rows. If your query needs to map constraints in first before pulling a subset, this is a reasonable solution.
It will however pull N rows for each date partition selected due to the way that q queries work under the covers. So YMMV and this might not be the best solution if it was behind an API for example.
/ 5 rows (i[5] is the 6th row)
select from t where date=2021.02.25, sum=`abcd, price=1234.5, i<i[5]
If your table is date partitioned, you can simply run
select col1,col2 from t where date=2021.02.25,i=0
That will get the first record from 2021.02.25's partition, and avoid loading every record into memory.
Per your first request (which is different to above) select[1] from t, you can achieve that with
.Q.ind[t;enlist 0]

Trying different functions

in the first function, I'm making the job column lowercase and then searching through but it's not finding any data. Why? Thanks. Just FYI since you don't have the database, all records in the JOB column are uppercase (that's why isn't returning anything), but that's also why I'm making it lowercase first.
In the second function, I'm trying to concat only ename with specific criteria --anything that has an r in the ENAME column (there are multiple records with the r in it), but isn't working (no data found), why? How do I get it done? Thanks.
SELECT LOWER(JOB) FROM EMP
WHERE JOB = LOWER('MANAGER');
SELECT CONCAT('My name is ',ename)
FROM EMP
WHERE ENAME LIKE '%r%';
I tested both of your SQL statements and they work fine for me. Are you sure the records are in db? Are you sure the names of the rows are correct?
EDIT : OK, so the name of column is in lower case but in your WHERE its in uppercase. Thats all :)

COUNT(field) returns correct amount of rows but full SELECT query returns zero rows

I have a UDF in my database which basically tries to get a station (e.g. bus/train) based on some input data (geographic/name/type). Inside this function i try to check if there are any rows matching the given values:
SELECT
COUNT(s.id)
INTO
firsttry
FROM
geographic.stations AS s
WHERE
ST_DWithin(s.the_geom,plocation,0.0017)
AND
s.name <-> pname < 0.8
AND
s.type ~ stype;
The firsttry variable now contains the value 1. If i use the following (slightly extended) SELECT statement i get no results:
RETURN query SELECT
s.id, s.name, s.type, s.the_geom,
similarity(
regexp_replace(s.name::text,'(Hauptbahnhof|Hbf)','Hbf'),
regexp_replace(pname::text,'(Hauptbahnhof|Hbf)','Hbf')
)::double precision AS sml,
st_distance(s.the_geom,plocation) As dist from geographic.stations AS s
WHERE ST_DWithin(s.the_geom,plocation,0.0017) and s.name <-> pname < 0.8
AND s.type ~ stype
ORDER BY dist asc,sml desc LIMIT 1;
the parameters are as follows:
stype = '^railway'
pname = 'Amsterdam Science Park'
plocation = ST_GeomFromEWKT('SRID=4326;POINT(4.9492530 52.3531670)')
the tuple i need to be returned is:
id name type geom (displayed as ST_AsText)
909658;"Amsterdam Sciencepark";"railway_station";"POINT(4.9482893 52.352904)"
The same UDF returns quite well for a lot of other stations, but this is one (of more) which just won't work. Any suggestions?
P.S. The use of the <-> operator is coming from the pg_trgm module.
Some ideas on how to troubleshoot this:
Break your troubleshooting into steps. Start with the simplest query possible. No aggregates, just joins and no filters. Then add filters. Then add order by, then add aggregates. Look at exactly where the change occurs.
Try reindexing the database.
One possibility that occurs to me based on this is that it could be a corrupted index used in the second query but not the first. I have seen corrupted indexes in the past and usually they throw errors but at least in theory they should be able to create a problem like this.
If this is correct, your query will suddenly return rows if you remove the ORDER BY clause.
If you have a corrupted index, then you need to pay close attention to hardware. Is the RAM ECC? Is the processor overheating? How are you disks doing?
A second possibility is that there is a typo on a join condition of filter statement. Normally this is something I would suspect first but it is easy enough to weed out index problems to start there. If removing the ORDER BY doesn't change things, then chances are it is a typo. If you can't find a typo, then try reindexing.

How to correctly enum and partition a kdb table?

I put together a few lines to partition my kdb table, which contains string columns of course and thus must to be enumerated.
I wonder if this code is completely correct or if it can be simplified further. In particular, I have some doubt about the need to create a partitioned table schema given the memory table and the disk table will have exactly the same layout. Also, there might be a way to avoid creating the temporary tbl_mem and tbl_mem_enum tables:
...
tbl_mem: select ts,sym,msg_type from oms_mem lj sym_mem;
tbl_mem_enum: .Q.en[`$sym_path] tbl_mem;
delete tbl_mem from `.;
(`$db;``!((17;2;9);(17;2;9))) set ([]ts:`time$(); ticker:`symbol$(); msg_type:`symbol$());
(`$db) upsert (select ts,ticker:sym,msg_type from tbl_mem_enum)
delete tbl_mem_enum from `.;
PS: I know, I shouldn't use "_" to name variables, but then what do I use to separate words in a variable or function name? . is also a kdb function.
I think you mean that your table contains symbol columns - these are the columns that you need to enumerate (strings don't need enumeration). You can do the write and enumeration in a single step. Also if you are using the same compression algo/level on all columns then it may be easier to just use .z.zd:
.z.zd:17 2 9i;
(`$db) set .Q.en[`$sym_path] select ts, ticker:sym, msg_type from oms_mem lj sym_mem;
It's generally recommended to use camelCase instead of '_'. Some useful info here: http://www.timestored.com/kdb-guides/q-coding-standards