Combining a non-*-projection and "returning" in Slick - postgresql

I have two working statements to insert a row into a table named document with Slick.
The first one inserts all columns, taken from the row object and returns the ID generated by Postgres:
(Tables.Document returning Tables.Document.map(_.id)) +=
DocumentRow(id=-1, name="Name", fulltext="The text")
The second one ignores the column named fulltext and only inserts the name but does not return the generated ID:
Tables.Document.map(r => (r.name)) += ("Name")
How can I combine both (limiting the insert to a subset of columns and returning the generated ID at the same time)?
Background:
The reason why I want to exclude the fulltext column from the insert is the fact that it is of Postgres type tsvector, but the generated Slick code treats it as a String. At insert time the value (even if null or None) is converted into some text type which is incompatible with tsvector and raises an exception. I found no solution to insert a tsvector without an additional library. Please comment if you think there is and I should be rather following this path.

Although I believe the right way is to fix tsvector issue, I don't have enough experience with Postres to help you with it. As for your workaround, you can do it and the code should look something like this:
(Tables.Document.map(r => (r.name)) returning Tables.Document.map(_.id)) += ("Name")
If you split it into parts you can see that first you create a Query as in your second example but then rather than apply += to it immediately you first chain it with returning and only then call +=.
P.S. What is the issue with using some additional Postres-aware library?

Related

PostgreSQL: Return auto-generated ids from COPY FROM insertion

I have a non-empty PostgreSQL table with a GENERATED ALWAYS AS IDENTITY column id. I do a bulk insert with the C++ binding pqxx::stream_to, which I'm assuming uses COPY FROM. My problem is that I want to know the ids of the newly created rows, but COPY FROM has no RETURNING clause. I see several possible solutions, but I'm not sure if any of them is good, or which one is the least bad:
Provide the ids manually through COPY FROM, taking care to give the values which the identity sequence would have provided, then afterwards synchronize the sequence with setval(...).
First stream the data to a temp-table with a custom index column for ordering. Then do something likeINSERT INTO foo (col1, col2)
SELECT ttFoo.col1, ttFoo.col2 FROM ttFoo
ORDER BY ttFoo.idx RETURNING foo.id
and depend on the fact that the identity sequence produces ascending numbers to correlate them with ttFoo.idx (I cannot do RETURNING ttFoo.idx too because only the inserted row is available for that which doesn't contain idx)
Query the current value of the identity sequence prior to insertion, then check afterwards which rows are new.
I would assume that this is a common situation, yet I don't see an obviously correct solution. What do you recommend?
You can find out which rows have been affected by your current transaction using the system columns. The xmin column contains the ID of the inserting transaction, so to return the id values you just copied, you could:
BEGIN;
COPY foo(col1,col2) FROM STDIN;
SELECT id FROM foo
WHERE xmin::text = (txid_current() % (2^32)::bigint)::text
ORDER BY id;
COMMIT;
The WHERE clause comes from this answer, which explains the reasoning behind it.
I don't think there's any way to optimise this with an index, so it might be too slow on a large table. If so, I think your second option would be the way to go, i.e. stream into a temp table and INSERT ... RETURNING.
I think you can create id with type is uuid.
The first step, you should random your ids after that bulk insert them, by this way your will not need to return ids from database.

Fast way to check if PostgreSQL jsonb column contains certain string

The past two days I've been reading a lot about jsonb, full text search, gin index, trigram index and what not but I still can not find a definitive or at least a good enough answer on how to fastly search if a row of type JSONB contains certain string as a value. Since it's a search functionality the behavior should be like that of ILIKE
What I have is:
Table, lets call it app.table_1 which contains a lot of columns one of which is of type JSONB, so lets call it column_jsonb
The data inside column_jsonb will always be flatten (no nested objects, etc) but the keys can vary. An example of the data in the column with obfuscated values looks like this:
"{""Key1"": ""Value1"", ""Key2"": ""Value2"", ""Key3"": null, ""Key4"": ""Value4"", ""Key5"": ""Value5""}"
I have a GIN index for this column which doesn't seems to affect the search time significantly (I am testing with 20k records now which takes about 550ms). The indes looks like this:
CREATE INDEX ix_table_1_column_jsonb_gin
ON app.table_1 USING gin
(column_jsonb jsonb_path_ops)
TABLESPACE pg_default;
I am interested only in the VALUES and the way I am searching them now is this:
EXISTS(SELECT value FROM jsonb_each(column_jsonb) WHERE value::text ILIKE search_term)
Here search_term is variable coming from the front end with the string that the user is searching for
I have the following questions:
Is it possible to make the check faster without modifying the data model? I've read that trigram index might be usfeul for similar cases but at least for me it seems that converting jsonb to text and then checking will be slower and actually I am not sure if the trigram index will actually work if the column original type is JSONB and I explicitly cast each row to text? If I'm wroing I would really appreciate some explanation with example if possible.
Is there some JSONB function that I am not aware of which offers what I am searching for out of the box, I'm constrained to PostgreSQL v 11.9 so some new things coming with version 12 are not available for me.
If it's not possible to achieve significant improvement with the current data structure can you propose a way to restructure the data in column_jsonb maybe another column of some other type with data persisted in some other way, I don't know...
Thank you very much in advance!
If the data structure is flat, and you regularly need to search the values, and the values are all the same type, a traditional key/value table would seem more appropriate.
create table table1_options (
table1_id bigint not null references table1(id),
key text not null,
value text not null
);
create index table1_options_key on table1_options(key);
create index table1_options_value on table1_options(value);
select *
from table1_options
where value ilike 'some search%';
I've used simple B-Tree indexes, but you can use whatever you need to speed up your particular searches.
The downsides are that all values must have the same type (doesn't seem to be a problem here) and you need an extra table for each table. That last one can be mitigated somewhat with table inheritance.

Handling the output of jsonb_populate_record

I'm a real beginner when it comes to SQL and I'm currently trying to build a database using postgres. I have a lot of data I want to put into my database in JSON files, but I have trouble converting it into tables. The JSON is nested and contains many variables, but the behavior of jsonb_populate_record allows me to ignore the structure I don't want to deal with right now. So far I have:
CREATE TABLE raw (records JSONB);
COPY raw from 'home/myuser/mydocuments/mydata/data.txt'
create type jsonb_type as (time text, id numeric);
create table test as (
select jsonb_populate_record(null::jsonb_type, raw.records) from raw;
When running the select statement only (without the create table) the data looks great in the GUI I use (DBeaver). However it does not seem to be an actual table as I cannot run select statements like
select time from test;
or similar. The column in my table 'test' also is called 'jsonb_populate_record(jsonb_type)' in the GUI, so something seems to be going wrong there. I do not know how to fix it, I've read about people using lateral joins when using json_populate_record, but due to my limited SQL knowledge I can't understand or replicate what they are doing.
jsonb_populate_record() returns a single column (which is a record).
If you want to get multiple columns, you need to expand the record:
create table test
as
select (jsonb_populate_record(null::jsonb_type, raw.records)).*
from raw;
A "record" is a a data type (that's why you need create type to create one) but one that can contain multiple fields. So if you have a column in a table (or a result) that column in turn contains the fields of that record type. The * then expands the fields in that record.

select all columns except two in q kdb historical database

In output I want to select all columns except two columns from a table in q/kdb historical database.
I tried running below query but it does not work on hdb.
delete colid,coltime from table where date=.z.d-1
but it is failing with below error
ERROR: 'par
(trying to update a physically partitioned table)
I referred https://code.kx.com/wiki/Cookbook/ProgrammingIdioms#How_do_I_select_all_the_columns_of_a_table_except_one.3F but no help.
How can we display all columns except for two in kdb historical database?
The reason you are getting par error is due to the fact that it is a partitioned table.
The error is documented here
trying to update a partitioned table
You cannot directly update, delete anything on a partitioned table ( there is a separate db maintenance script for that)
The query you have used as fix is basically selecting the data first in-memory (temporarily) and then deleting the columns, hence it is working.
delete colid,coltime from select from table where date=.z.d-1
You can try the following functional form :
c:cols[t] except `p
?[t;enlist(=;`date;2015.01.01) ;0b;c!c]
Could try a functional select:
?[table;enlist(=;`date;.z.d);0b;{x!x}cols[table]except`colid`coltime]
Here the last argument is a dictionary of column name to column title, which tells the query what to extract. Instead of deleting the columns you specified this selects all but those two, which is the same query more or less.
To see what the functional form of a query is you can run something like:
parse"select colid,coltime from table where date=.z.d"
And it will output the arguments to the functional select.
You can read more on functional selects at code.kx.com.
Only select queries work on partitioned tables, which you resolved by structuring your query where you first selected the table into memory, then deleted the columns you did not want.
If you have a large number of columns and don't want to create a bulky select query you could use a functional select.
?[table;();0b;{x!x}((cols table) except `colid`coltime)]
And show all columns except a subset of columns. The column clause expects a dictionary hence I am using the function {x!x} to convert my list to a dictionary. See more information here
https://code.kx.com/q/ref/funsql/
As nyi mentioned, if you want to permanently delete columns from an historical database you can use the deleteCol function in the dbmaint tools https://github.com/KxSystems/kdb/blob/master/utils/dbmaint.md

Retrieve all columns in PostgreSQL while specifically formatting only one

I'm using redash on a PostgreSQL db and for some of our tables, querying all columns using SELECT * returns an error akin to column X is not JSON serializable. My understanding is that it's because a column contains a datetime range (and doesn't play well with redash).
Is there a way in postgres to easily retrieve all columns, while converting a specfic one to text? Or even converting all columns to text?
So far, in order to do this, I have to list out every column name and convert the culprit to text.
My hope is finding something along the lines of:
SELECT created_at::text, *
or...
SELECT *::text
Where the wonky column (created_at) is converted to text, and somehow appended to everything else.