How to cast number to symbol in KDB? - kdb

A column has undergone a data type change, so the query has to be changed too:
Old query:
select from person where id = 100
New query:
select from person where id = `100
I'm new to Q and haven't been able to figure out how to do this:
Example:
I want to convert 100 to `100.

You would need to convert to string first and then cast to symbol:
q)`$string 100
`100
However, having a numerical column as symbols is a pretty bad idea. Is this table being written to disk? If so this could possibly blow up your sym file and blow up your in-memory interned symbol list (increasing your memory usage).....assuming the numerical values are not very repetitive

Related

Can't parse JSONB scalar as array in Postgres

I'm on PostgreSQL 13.2. I have a table with a JSONB column, and it stores JSONs that are a list of objects:
"[{\"MyKey\":\"ValueXYZ\",\"Counter\":0}, {\"MyKey\":\"ValueABC\",\"Counter\":3}]"
When I test this column for a type with jsonb_typeof() like so:
select jsonb_typeof(i.my_column) as col_type
from items i
where i.id = 342
I get string. Which tells me this value is a scalar, and I'm wondering if maybe it wasn't inserted properly.
The error that is bothering me is I am trying to parse the column with something like this:
select jsonb_array_elements(i.my_column)
from items i
and I see the error:
SQL Error [22023]: ERROR: cannot extract elements from a scalar
What is going on? Is there a way to fix this?
Yes, it got inserted wrong. It contains a scalar, which happens to be holding the string representation of a JSON array. The string representation of a JSON array is a different thing than an actual JSON array.
You can extract the string value from the scalar, then cast that string into a jsonb. #>>'{}' will extract the string out of a scalar.
select jsonb_array_elements((i.my_column#>>'{}')::jsonb)
from items i ;
Although you should fundamentally fix the problem, by re-storing the values correctly.
update items set my_column = (my_column#>>'{}')::jsonb where jsonb_typeof(my_column)='string';
But of course you should fix whatever is doing the incorrect insertions first.
Looks like a quoting issue. Consider:
test=> SELECT jsonb_typeof(jsonb '[{"MyKey":"ValueXYZ","Counter":0}, {"MyKey":"ValueABC","Counter":3}]') AS col_type;
col_type
----------
array
(1 row)

Cast char list column to symbol

I have a table which has ~ 3 billion rows in hdb. One of the column is char list, I want to cast this column to symbol after loading the hdb. But memory quickly crosses over 300GB which I cannot afford. Can this be optimized in any way?
Are you trying to cast to symbol in-memory (temporary) or cast to symbol on-disk (permanent)? If in-memory, you shouldn't try to cast to symbol for all dates, you can just cast to symbol as you select from it (with a date filter) or build a wrapper function to handle this. You need to analyse how repetitive the strings are though as every string you cast to symbol gets interned and consumes memory. If the strings are very unique (.e.g long) then you may end up creating too many interned symbols leading to your memory blowup.
If on-disk you should be using Kx's dbmaint utility - it has a specific example of casting from char list (string) to enumerated symbol.
https://github.com/KxSystems/kdb/blob/master/utils/dbmaint.md#fncol
You have to be very careful though - again you need to analyse the string column to ensure that it is repetitive enough to warrant casting to symbol (adding as few new symbols to the sym file as possible). If the strings are very unique then you should not cast to symbol as you risk polluting the sym file with a lot of new symbols.
Ultimately the most efficient approach is to make the permanent on-disk change assuming the strings are repetitive (e.g. short)

PostgreSql Queries treats Int as string datatypes

I store the following rows in my table ('DataScreen') under a JSONB column ('Results')
{"Id":11,"Product":"Google Chrome","Handle":3091,"Description":"Google Chrome"}
{"Id":111,"Product":"Microsoft Sql","Handle":3092,"Description":"Microsoft Sql"}
{"Id":22,"Product":"Microsoft OneNote","Handle":3093,"Description":"Microsoft OneNote"}
{"Id":222,"Product":"Microsoft OneDrive","Handle":3094,"Description":"Microsoft OneDrive"}
Here, In this JSON objects "Id" amd "Handle" are integer properties and other being string properties.
When I query my table like below
Select Results->>'Id' From DataScreen
order by Results->>'Id' ASC
I get the improper results because PostgreSql treats everything as a text column and hence does the ordering according to the text, and not as integer.
Hence it gives the result as
11,111,22,222
instead of
11,22,111,222.
I don't want to use explicit casting to retrieve like below
Select Results->>'Id' From DataScreen order by CAST(Results->>'Id' AS INT) ASC
because I will not be sure of the datatype of the column due to the fact that JSON structure will be dynamic and the keys and values may change next time. and Hence could happen the same with another JSON that has Integer and string keys.
I want something so that Integers in Json structure of JSONB column are treated as integers only and not as texts (string).
How do I write my query so that Id And Handle are retrieved as Integer Values and not as strings , without explicit casting?
I think your assumtions about the id field don't make sense. You said,
(a) Either id contains integers only or
(b) it contains strings and integers.
I'd say,
If (a) then numerical ordering is correct.
If (b) then lexical ordering is correct.
But if (a) for some time and then (b) then the correct order changes, too. And that doesn't make sense. Imagine:
For the current database you expect the order 11,22,111,222. Then you add a row
{"Id":"aa","Product":"Microsoft OneDrive","Handle":3095,"Description":"Microsoft OneDrive"}
and suddenly the correct order of the other rows changes to 11,111,22,222,aa. That sudden change is what bothers me.
So I would either expect a lexical ordering ab intio, or restrict my id field to integers and use explicit casting.
Every other option I can think of is just not practical. You could, for example, create a custom < and > implementation for your id field which results in 11,111,22,222,aa. ("Order all integers by numerical value and all strings by lexical order and put all integers before the strings").
But that is a lot of work (it involves a custom data type, a custom cast function and a custom operator function) and yields some counterintuitive results, e.g. 11,111,22,222,0a,1a,2a,aa (note the position of 0a and so on. They come after 222).
Hope, that helps ;)
If Id always integer you can cast it in select part and just use ORDER BY 1:
select (Results->>'Id')::int From DataScreen order by 1 ASC

Temp Tables Calculating Fields

I am joining two tables and outputting to a csv file. This has worked ok,
But I would like to create a calculated field (an integer field multiplied by a decimal field) and output that as one of the columns.
I am struggling at the moment to calculate the field and store it.
CREATE TEMP-TABLE tth2.
tth2:CREATE-LIKE(buf-woins-hndl).
tth2:ADD-LIKE-FIELD("ttqtyhrs","work_order.est_ltime").
tth2:TEMP-TABLE-PREPARE("ordx2").
bh2 = tth2:DEFAULT-BUFFER-HANDLE.
FOR EACH wo_instr NO-LOCK:
bh2:BUFFER-CREATE.
bh2:BUFFER-COPY(buf-woins-hndl).
ASSIGN bh2:BUFFER-VALUE("ttqtyhrs") = bh2:BUFFER-VALUE ("craft_nbr") *
bh2:BUFFER-VALUE("std_hrs").
END.
I am trying store the result of the calculation in temp table field ttqtyhrs
I get an error message
Invalid datatype for argument to method 'BUFFER-VALUE'. Expecting 'integer' (5442)
when I try to compile.
I would be grateful for any pointers
Andy
Most likely you want to something like this:
ASSIGN
bh2:BUFFER-FIELD("ttqtyhrs"):BUFFER-VALUE() = bh2:BUFFER-FIELD("craft_nbr"):BUFFER-VALUE() * bh2:BUFFER-FIELD("std_hrs"):BUFFER-VALUE().
BUFFER-VALUE takes an integer representing the index if the field is an extent/array. You need to pinpoint the BUFFER-FIELD!

Char vs Symbol KDB parted splay table

I am creating a new table on a KDB database as a parted splay (parted by date), the new table schema has a column called CCYY, which has a lot of repeating values. I am unsure if I should save it as char or symbols. My main goal is to use least amount of memory.
As a result which one should I use? What is the benefit/disadvantage of saving repeating values as either a char array or a symbol in a parted splayed setup?
It sounds like you should use symbol.
There's a guide to symbols/enumerations here:http://www.timestored.com/kdb-guides/strings-symbols-enumeration#when-to-use quote:
Typically you should follow the guidelines:
If the column is used in where clause equality comparisons e.g.
select from t where sym in AB -> Symbol
Short, often repeated strings -> Symbol
Else Long, Non-repeated strings -> String
When evaluating whether or not to use symbol for a column, cardinality of that column is key. Length of individual values matters less and, if anything, longer values might be better off as symbol, as they will be stored only once in the sym file, but repeated in the char vector. That consideration is pretty much moot if you compress you data on disk though.
If your values are short enough, don't forget about the possibility of using .Q.j10, .Q.x10, .Q.j12 and .Q.x12. This will use less space than a char vector. And it doesn't rely on a sym file, which in complex environments can save you from having to re-enumerate if you are, say, copying tables between hdbs who's sym files are not in sync.
If space is a concern, always compress the data on disk.