How to splay a column of type dictionary - kdb

I have a column in my table whose values that are dictionaries. The type in the meta of that column is " ".
I want to know how to splay this table. When I try to splay it, I get a type error. I am aware only vectors can be splayed, however, I have seen a table where a column holds dictionaries splayed before, so I know it's possible, but I am not sure how it is done.

Dictionaries are only supported in kdb version >3.6.
If you are running 3.6/4.0, double check you are enumerating the table for splay.
`:path/to/table set .Q.en[`:hdb;table]
If <3.6 json string is a good alternative although not recommended on large tables as .j.k is slow.

Related

How to understand the return type?

I'm building a framework for rust-postgres.
I need to know what value type will be returned from a row.try_get, to get the value in a variable of the appropriate type.
I can get the sql type from row.columns()[index].type, but not if the value is nullable , so i can't decide to put the value in a normal type or a Option<T>.
I can use just the content of the row to understand it, i can't do things like "get the table structure from Postgresql".
is there a way?
The reason that the Column type does not expose any way to find out if a result column is nullable is because the database does not return this information.
Remember that result columns are derived from running a query, and that query may contain arbitrary expressions. If the query was a simple SELECT of columns from a table, then it would be reasonably simple to determine if a column could be nullable.
But it could also be a very complex expression, derived from multiple columns, subselects or even custom functions. Postgres can figure out the data type of each column, but in the general case it doesn't know if a result column may contain nulls.
If your application is only performing simple queries, and you know which table column each result column comes from, then you can find out if that table column is nullable like this:
SELECT is_nullable
FROM information_schema.columns
WHERE table_schema='myschema'
AND table_name='mytable'
AND column_name='mycolumn';
If your queries are not that simple then I recommend you always get the result as an Option<T> and handle the possibility that the result might be None.

How to efficiently retrieve rows with large JSON objects in Postgres

I've inherited the task of retrieving data from a Postgres table.
The table has ~1m rows, and there are about 145k rows that I wish to retrieve. These 145k rows have a common string in one of their columns batch_name that I can use to search for them.
The table has two columns payload & result that are of type JSON. The result column contains the data that I wish to retrieve.
When I make even the simplest queries to the table:
SELECT * FROM table_name WHERE batch_name = 'an_id' limit 10
The request takes ~7-10 seconds to return data.
This is despite the fact that the batch_name column has an index on it and it's of type varchar(255)
Whilst investigating this, I've discovered that the JSON objects in the result column and payload column can be absolutely gigantic objects. When prettified, they are sometimes ~27k lines long.
These gigantic JSON objects seem to be the root cause of the problem.
My questions are:
What can I do to improve the efficiency of this query? Or is the ultimate solution here to just modify the table such that we are no longer storing gigantic JSON objects?
Given that I don't need to actually query fields in these JSON objects (but I DO need to retrieve them), would simply storing them as strings improve efficiency?
Why is storing large JSON objects SO inefficient?
Thanks in advance for any help, it's much appreciated.

How to find an arbitrary key within a postgres jsonb object?

There are several operators in postgres for getting elements at a certain path in jsonb.
But how could I retrieve all the values that have a key of 'foo', if I don't know where in the whole object structure they will appear?
I saw there is a regex matches function which would return me matching regexes, but the object keyed off foo could be arbitrarily complex, so tough to come up with a regex that would pull the whole object out neatly?
Thanks for your help
SELECT jsonb_column->'foo'
FROM table
[WHERE jsonb_column ? 'foo'] -- ignore values without key "foo"

How to correctly enum and partition a kdb table?

I put together a few lines to partition my kdb table, which contains string columns of course and thus must to be enumerated.
I wonder if this code is completely correct or if it can be simplified further. In particular, I have some doubt about the need to create a partitioned table schema given the memory table and the disk table will have exactly the same layout. Also, there might be a way to avoid creating the temporary tbl_mem and tbl_mem_enum tables:
...
tbl_mem: select ts,sym,msg_type from oms_mem lj sym_mem;
tbl_mem_enum: .Q.en[`$sym_path] tbl_mem;
delete tbl_mem from `.;
(`$db;``!((17;2;9);(17;2;9))) set ([]ts:`time$(); ticker:`symbol$(); msg_type:`symbol$());
(`$db) upsert (select ts,ticker:sym,msg_type from tbl_mem_enum)
delete tbl_mem_enum from `.;
PS: I know, I shouldn't use "_" to name variables, but then what do I use to separate words in a variable or function name? . is also a kdb function.
I think you mean that your table contains symbol columns - these are the columns that you need to enumerate (strings don't need enumeration). You can do the write and enumeration in a single step. Also if you are using the same compression algo/level on all columns then it may be easier to just use .z.zd:
.z.zd:17 2 9i;
(`$db) set .Q.en[`$sym_path] select ts, ticker:sym, msg_type from oms_mem lj sym_mem;
It's generally recommended to use camelCase instead of '_'. Some useful info here: http://www.timestored.com/kdb-guides/q-coding-standards

How to alter Postgres table data based on its contents?

This is probably a super simple question, but I'm struggling to come up with the right keywords to find it on Google.
I have a Postgres table that has among its contents a column of type text named content_type. That stores what type of entry is stored in that row.
There are only about 5 different types, and I decided I want to change one of them to display as something else in my application (I had been directly displaying these).
It struck me that it's funny that my view is being dictated by my database model, and I decided I would convert the types being stored in my database as strings into integers, and enumerate the possible types in my application with constants that convert them into their display names. That way, if I ever got the urge to change any category names again, I could just change it with one alteration of a constant. I also have the hunch that storing integers might be somewhat more efficient than storing text in the database.
First, a quick threshold question of, is this a good idea? Any feedback or anything I missed?
Second, and my main question, what's the Postgres command I could enter to make an alteration like this? I'm thinking I could start by renaming the old content_type column to old_content_type and then creating a new integer column content_type. However, what command would look at a row's old_content_type and fill in the new content_type column based off of that?
If you're finding that you need to change the display values, then yes, it's probably a good idea not to store them in a database. Integers are also more efficient to store and search, but I really wouldn't worry about it unless you've got millions of rows.
You just need to run an update to populate your new column:
update table_name set content_type = (case when old_content_type = 'a' then 1
when old_content_type = 'b' then 2 else 3 end);
If you're on Postgres 8.4 then using an enum type instead of a plain integer might be a good idea.
Ideally you'd have these fields referring to a table containing the definitions of type. This should be via a foreign key constraint. This way you know that your database is clean and has no invalid values (i.e. referential integrity).
There are many ways to handle this:
Having a table for each field that can contain a number of values (i.e. like an enum) is the most obvious - but it breaks down when you have a table that requires many attributes.
You can use the Entity-attribute-value model, but beware that this is too easy to abuse and cause problems when things grow.
You can use, or refer to my implementation solution PET (Parameter Enumeration Tables). This is a half way house between between 1 & 2.