Select from a table with Limit expression works, without - fails - kdb

For a table t with a custom field c which is dictionary I could use select with limit expression, but simple select failes:
q)r1: `n`m`k!111b;
q)r2: `n`m`k!000b;
q)t: ([]a:1 2; b:10 20; c:(r1; r2));
q)t
a b c
----------------
1 10 `n`m`k!111b
2 20 `n`m`k!000b
q)select[2] c[`n] from t
x
-
1
0
q)select c[`n] from t
'type
[0] select c[`n] from t
^
Is it a bug, or am I missing something?
Upd:
Why does select [2] c[`n] from t work here?

Since c is a list, it does not support key indexing which is why it has returned a type
You need to index into each element instead of trying to index the column.
q)select c[;`n] from t
x
-
1
0
A list of confirming dictionaries outside of this context is equivalent to a table, so you can index like you were
q)c:(r1;r2)
q)type c
98h
q)c[`n]
10b
I would say that the way complex columns are represented in memory makes this not possible. I suspect that any modification that creates a copy of a subset of the elements will allow column indexing as the copy will be formatted as a table.
One example here is serialising and deserialising the column (not recommended to do this). In the case of select[n] it is selecting a subset of 2 elements
q)type exec c from t
0h
q)type exec -9!-8!c from t
98h
q)exec (-9!-8!c)[`n] from t
10b

Related

select only those columns from table have not null values in q kdb

I have a table:
q)t:([] a:1 2 3; b:```; c:`a`b`c)
a b c
-----
1 a
2 b
3 c
From this table I want to select only the columns who have not null values, in this case column b should be omitted from output.(something similar to dropna method in pandas).
expected output
a c
---
1 a
2 b
3 c
I tried many things like
select from t where not null cols
but of no use.
Here is a simple solution that does just what you want:
q)where[all null t]_t
a c
---
1 a
2 b
3 c
[all null t] gives a dictionary that checks if the column values are all null or not.
q)all null t
a| 0
b| 1
c| 0
Where returns the keys of the dictionary where it is true
q)where[all null t]
,`b
Finally you use _ to drop the columns from table t
Hopefully this helps
A modification of Sander's solution which handles string columns (or any nested columns):
q)t:([] a:1 2 3; b:```; c:`a`b`c;d:" ";e:("";"";"");f:(();();());g:(1 1;2 2;3 3))
q)t
a b c d e f g
----------------
1 a "" 1 1
2 b "" 2 2
3 c "" 3 3
q)where[{$[type x;all null x;all 0=count each x]}each flip t]_t
a c g
-------
1 a 1 1
2 b 2 2
3 c 3 3
The nature of kdb is column based, meaning that where clauses function on the rows of a given column.
To make a QSQL query produce your desired behaviour, you would need to first examine all your columns and determine which are all null, and then feed that into a functional statement. Which would be horribly inefficient.
Given that you need to fully examine all the columns data regardless (to check if all the values are null) the following will achieve that
q)#[flip;;enlist] k!d k:key[d] where not all each null each value d:flip t
a c
---
1 a
2 b
3 c
Here I'm transforming the table into a dictionary, and extracting its values to determine if any columns consist only of nulls (all each null each). I'm then applying that boolean list to the keys of the dictionary (i.e., the column names) through a where statement. We can then reindex into the original dictionary with those keys and create a subset dictionary of non-null columns and convert that back into a table.
I've generalized the final transformation back into a table by habit with an error catch to ensure that the dictionary will be converted into a table even if only a single row is valid (preventing a 'rank error)

Can (aggregate) functions be used to define a column?

Assume a table like this one:
a | b | total
--|---|------
1 | 2 | 3
4 | 7 | 11
…
CREATE TEMPORARY TABLE summedup (
a double precision DEFAULT 0
, b double precision DEFAULT 0
--, total double precision
);
INSERT INTO summedup (a, b) VALUES (1, 2);
INSERT INTO summedup (a, b) VALUES (4, 7);
SELECT a, b, a + b as total FROM summedup;
It's easy to sum up the first two columns on SELECT.
But does Postgres (9.6) also support the ability to define total as the sum of the other two columns? If so:
What is the syntax?
What is this type of operation called (aggregates typically sum up cells over multiple rows, not columns.)
What you are looking for is typically called a "computed column".
Postgres 9.6 does not support that (Postgres 12 - to be released in Q4 2019 - will).
But for such a simple sum, I wouldn't bother storing redundant information.
If you don't want to repeat the expression, create a view.
I think what you want is a View.
CREATE VIEW table_with_sum AS
SELECT id, a, b, a + b as total FROM summedup;
then you can query the view for the sum.
SELECT total FROM table_with_sum where id=5;
The View does not store the sum for each row, the totalcolumn is computed every time you query the View. If your goal is to make your query more efficient, this will not help.
There is an other way: add the column to the table and create triggers for update and insert that update the total column every time a row is modified.

Best way to maintain an ordered list in PostgreSQL?

Say I have a table called list, where there are items like these (the ids are random uuids):
id rank text
--- ----- -----
x 0 Hello
x 1 World
x 2 Foo
x 3 Bar
x 4 Baz
I want to maintain the property that rank column always goes from 0 to n-1 (n being the number of rows)---if a client asks to insert an item with rank = 3, then the pg server should push the current 3 and 4 to 4 and 5, respectively:
id rank text
--- ----- -----
x 0 Hello
x 1 World
x 2 Foo
x 3 New Item!
x 4 Bar
x 5 Baz
My current strategy is to have a dedicated insertion function add_item(item) that scans through the table, filter out items with rank equal or greater than that of the item being inserted, and increment those ranks by one. However, I think this approach will run into all sorts of problems---like race conditions.
Is there a more standard practice or more robust approach?
Note: The rank column is completely independent of rest of the columns, and insertion is not the only operation I need to support. Think of it as the back-end of a sortable to-do list, and the user can add/delete/reorder the items on the fly.
Doing verbatim what you suggest might be difficult or not possible at all, but I can suggest a workaround. Maintain a new column ts which stores the time a record is inserted. Then, insert the current time along with rest of the record, i.e.
id rank text ts
--- ----- ----- --------------------
x 0 Hello 2017-12-01 12:34:23
x 1 World 2017-12-03 04:20:01
x 2 Foo ...
x 3 New Item! 2017-12-12 11:26:32
x 3 Bar 2017-12-10 14:05:43
x 4 Baz ...
Now we can easily generate the ordering you want via a query:
SELECT id, rank, text,
ROW_NUMBER() OVER (ORDER BY rank, ts DESC) new_rank
FROM yourTable;
This would generate 0 to 5 ranks in the above sample table. The basic idea is to just use the already existing rank column, but to let the timestamp break the tie in ordering should the same rank appear more than once.
you can wrap it up to function if you think its worth of:
t=# with u as (
update r set rank = rank + 1 where rank >= 3
)
insert into r values('x',3,'New val!')
;
INSERT 0 1
the result:
t=# select * from r;
id | rank | text
----+------+----------
x | 0 | Hello
x | 1 | World
x | 2 | Foo
x | 3 | New val!
x | 4 | Bar
x | 5 | Baz
(6 rows)
also worth of mention you might have concurrency "chasing condition" problem on highly loaded systems. the code above is just a sample
You can have a “computed rank” which is a double precision and a “displayed rank” which is an integer that is computed using the row_number window function on output.
When a row is inserted that should rank between two rows, compute the new rank as the arithmetic mean of the two ranks.
The advantage is that you don't have to update existing rows.
The down side is that you have to calculate the displayed ranks before you can insert a new row so that you know where to insert it.
This solution (like all others) are subject to race conditions.
To deal with these, you can either use table locks or serializable transactions.
The only way to prevent a race condition would be to lock the table
https://www.postgresql.org/docs/current/sql-lock.html
Of course this would slow you down if there are lots of updates and inserts.
If can somehow limit the scope of your updates then you can do a SELECT .... FOR UPDATE on that scope. For example if the records have a parent_id you can do a select for update on the parent record first and any other insert who does the same select for update would have to wait till your transaction is done.
https://www.postgresql.org/docs/current/explicit-locking.html#:~:text=5.-,Advisory%20Locks,application%20to%20use%20them%20correctly.
Read the section on advisory locks to see if you can use those in your application. They are not enforced by the system so you'll need to be careful of how you write your application.

How to pass dictionary into query constraint?

If this is the dictionary of constraint:
dictName:`region`Code;
dictValue:(`NJ`NY;`EEE213);
dict:dictName!dictValue;
I would like to pass the dict to a function and depending on how many keys there are and let the query react accordingly. If there is one key region, then I would like to put it as
select from table where region in dict`region;
The same thing is for code. But if I pass two keys, I would like the query knows and pass it as:
select form table where region in dict`region,Code in dict`code;
Is there any way to do this?
I came up this code:
funcForOne:{[constraint]?[`bce;enlist(in;constraint;(`dict;enlist constraint));0b;()]};
funcForAll[]
{[dict]$[(null dict)~1;select from bce;($[(count key dict)=1;($[`region in (key dict);funcForOne[`region];funcForOne[`Code]]);select from bce where region in dict`region,rxmCode in dict`Code])]};
It works for one and two constraint. but when I called funcForAll[] it gives type error. How should I change it? i think it is from null dict~1
I tried count too. but doesn't work too well.
Update
So I did this but I have some error
tab:([]code:`B90056`B90057`B90058`B90059;region:`CA`NY`NJ`CA);
dictKey:`region`Code;dictValue:(`NJ`NY;`B90057);
dict:dictKey!dictValue;
?[tab;f dict;0b;()];
and I got 'NY error. Do you know why? Also,if I pass a null dictionary it doesn't seem working.
As I said funtional form would be the better approach but if your requirement is very limited as you said then you can consider other solution as below:
Note: Assuming all dictionary keys will be in table columns list.
q) f:{[dict] if[0=count dict;:select from t];
select from t where (#[key dict;t]) in {$[any 0<=type each value x;flip ;enlist ]x}[dict] }
Explanation:
1. convert dict to table depending on the values type. Flip if any value is a general list else enlist.
$[any 0<=type each value dict;flip ;enlist ]dict
Get subset of table t which consists only of dictionary keys as columns.
#[key dict;t]
get rows where (2) in (1)
Basically we are using below form of querying and matching:
q)t1:([]id:1 2;s:`a`b);
q)t2:([]id:1 3 ;s:`a`b);
q)select from t1 where ([]id;s) in t2
If you're just using in, you can do something like:
f:{{[x;y](in),'key[y],'(),x}[;x]enlist each value[x]}
So that:
q)d
a| 10 1
b| ,`a
q)f d
in `a 10 1
in `b ,`a
q)t
a b c
------
1 a 10
2 b 20
3 c 30
q)?[t;f d;0b;()]
a b c
------
1 a 10
Note that because of the enlist each the resulting list is enlisted so that singletons work too:
q)d:enlist[`a]!enlist 1
q)d
a| 1
q)?[t;f d;0b;()]
a b c
------
1 a 10
Update to secondary question
This still works with empty dict, i.e. ()!(). I'm passing in the dictionary variable.
In your 2nd question your dictionary is not constructed correctly (also remember q is case sensitive). Also your values need to be enlisted. Look up functional select in the reference pages on the kx site, you'll see that you need to enlist the symbol lists to differentiate them from column name declarations
`region`code!(enlist `NY`NJ;enlist `B90057)

Perl + PostgreSQL-- Selective Column to Row Transpose

I'm trying to find a way to use Perl to further process a PostgreSQL output. If there's a better way to do this via PostgreSQL, please let me know. I basically need to choose certain columns (Realtime, Value) in a file to concatenate certains columns to create a row while keeping ID and CAT.
First time posting, so please let me know if I missed anything.
Input:
ID CAT Realtime Value
A 1 time1 55
A 1 time2 57
B 1 time3 75
C 2 time4 60
C 3 time5 66
C 3 time6 67
Output:
ID CAT Time Values
A 1 time 1,time2 55,57
B 1 time3 75
C 2 time4 60
C 3 time5,time6 66,67
You could do this most simply in Postgres like so (using array columns)
CREATE TEMP TABLE output AS SELECT
id, cat, ARRAY_AGG(realtime) as time, ARRAY_AGG(value) as values
FROM input GROUP BY id, cat;
Then select whatever you want out of the output table.
SELECT id
, cat
, string_agg(realtime, ',') AS realtimes
, string_agg(value, ',') AS values
FROM input
GROUP BY 1, 2
ORDER BY 1, 2;
string_agg() requires PostgreSQL 9.0 or later and concatenates all values to a delimiter-separated string - while array_agg() (v8.4+) creates am array out of the input values.
About 1, 2 - I quote the manual on the SELECT command:
GROUP BY clause
expression can be an input column name, or the name or ordinal number
of an output column (SELECT list item), or ...
ORDER BY clause
Each expression can be the name or ordinal number of an output column
(SELECT list item), or
Emphasis mine. So that's just notational convenience. Especially handy with complex expressions in the SELECT list.