Making full text search with Postgresql and hibernate - missing type for tsvector and operator ## - postgresql

I have postgresql database where I'd like to run full text search queries by using JPA/Hibernate criteria API. My issue is that I don't know which type should I use for tsvector type and what is a replacement for ## operator?
I created a database view which combines data from two tables and makes a concatenated tsvector:
CREATE OR REPLACE VIEW node_name_description_tags AS
SELECT nodeId, document
FROM (SELECT node.id as nodeId,
to_tsvector('english', node.name) ||
to_tsvector('english', coalesce(node.description, ' ')) ||
to_tsvector('english', coalesce(string_agg(tag.name, ' '), ' ')) as document
FROM node
JOIN tag_node ON node.id = tag_node.node_id
JOIN tag ON tag.id = tag_node.tag_id
GROUP BY nodeId) as documents
Then I can run queries on it like this and it returns what I expect:
SELECT * FROM node_name_description_tags WHERE document ## PLAINTO_TSQUERY('english', 'integration user administration file')
What I was going to do next - create hibernate entity mapped to this view, but I don't know which type to use for column tsvector. Then I was going to to create a hibernate specification with where clause, but I don't know how operator ## is implemented in hibernate. It seems that this functionality isn't supported at all!
I found on the internet that people propose to use custom dialect with added full-text-search function which generates ## where clause. That's, basically, everything what I have now.
Any advice on how to make this working from Hibernate?

In my case, I don't want to make a full text search across all the fields. As a quick as working solution I added custom column, search queries to be run accordingly to:
#Column(name = "full_text_search_index", columnDefinition = "TEXT", nullable = false)
In this column I store and keep consistent (on entity updates) needed fields for search (some kind of denormalization), then search queries just run in 'contains' like mode. This column is indexes, so it is working quite good.

Related

Equivalent of Postgres's STRING_TO_ARRAY in SQLAlchemy

I have this function where I take a string and split it into an array of words. It works great in Postgres but I want to convert it to SQLAlchemy and I haven't been able to find a good alternative to STRING_TO_ARRAY. Are there any good work arounds that people have found?
Here is my basic query for reference:
SELECT type,
UNNEST(STRING_TO_ARRAY(description, ' ')) AS word
FROM item
From SQLAlchemy's documentation on SQL and Generic Functions, you should be able to use the function by name directly even if SQLAlchemy doesn't know about it.
Note that any name not known to func generates the function name as is - there is no restriction on what SQL functions can be called, known or unknown to SQLAlchemy, built-in or user defined.
from sqlalchemy import select, func, Table, Column, MetaData, String
metadata_obj = MetaData()
item = Table("item", metadata_obj, Column("description", String))
stmt = select(
func.unnest(
func.string_to_array(item.c.description, " ")
).label("word")
)
print(stmt.compile(compile_kwargs={"literal_binds": True}))
# SELECT unnest(string_to_array(item.description, ' ')) AS word
# FROM item
This remains a PostgreSQL specific query, but I don't see how to make it agnostic with both UNNEST and STRING_TO_ARRAY.

PostgreSQL, allow to filter by not existing fields

I'm using a PostgreSQL with a Go driver. Sometimes I need to query not existing fields, just to check - maybe something exists in a DB. Before querying I can't tell whether that field exists. Example:
where size=10 or length=10
By default I get an error column "length" does not exist, however, the size column could exist and I could get some results.
Is it possible to handle such cases to return what is possible?
EDIT:
Yes, I could get all the existing columns first. But the initial queries can be rather complex and not created by me directly, I can only modify them.
That means the query can be simple like the previous example and can be much more complex like this:
WHERE size=10 OR (length=10 AND n='example') OR (c BETWEEN 1 and 5 AND p='Mars')
If missing columns are length and c - does that mean I have to parse the SQL, split it by OR (or other operators), check every part of the query, then remove any part with missing columns - and in the end to generate a new SQL query?
Any easier way?
I would try to check within information schema first
"select column_name from INFORMATION_SCHEMA.COLUMNS where table_name ='table_name';"
And then based on result do query
Why don't you get a list of columns that are in the table first? Like this
select column_name
from information_schema.columns
where table_name = 'table_name' and (column_name = 'size' or column_name = 'length');
The result will be the columns that exist.
There is no way to do what you want, except for constructing an SQL string from the list of available columns, which can be got by querying information_schema.columns.
SQL statements are parsed before they are executed, and there is no conditional compilation or no short-circuiting, so you get an error if a non-existing column is referenced.

Use criteria or query dsl to find all tables names in a given schema

Is there a way to find all table names that begin with t_ in a given schema name with criteria API or query DSL( or even database metadata)? If it exists, could you please show me how I can do it using a schema name or view? I'm using PostgreSQL for the database.
I don't want to use a native query.
yes, you can use below query:
SELECT table_catalog,table_schema,table_name
FROM information_schema.tables
WHERE table_name LIKE 't\_%'
AND table_type='BASE TABLE' -- to filter out Tables only, remove if you need to see views as well

Is there a way to describe an external/spectrum table via redshift?

In AWS Athena you can write
SHOW CREATE TABLE my_table_name;
and see a SQL-like query that describes how to build the table's schema. It works for tables whose schema are defined in AWS Glue. This is very useful for creating tables in a regular RDBMS, for loading and exploring data views.
Interacting with Athena in this way is manual, and I would like to automate the process of creating regular RDBMS tables that have the same schema as those in Redshift Spectrum.
How can I do this through a query that can be run via psql? Or is there another way to get this via the aws-cli?
Redshift Spectrum does not support SHOW CREATE TABLE syntax, but there are system tables that can deliver same information. I have to say, it's not as useful as the ready to use sql returned by Athena though.
The tables are
svv_external_schemas - gives you information about glue database mapping and IAM roles bound to it
svv_external_tables - gives you the location information, and also data format and serdes used
svv_external_columns - gives you the column names, types and order information.
Using that data, you could reconstruct the table's DDL.
For example to get the list of columns and their types in the CREATE TABLE format one can do:
select distinct
listagg(columnname || ' ' || external_type, ',\n')
within group ( order by columnnum ) over ()
from svv_external_columns
where tablename = '<YOUR_TABLE_NAME>'
and schemaname = '<YOUR_SCHEM_NAME>'
the query give you the output similar to:
col1 int,
col2 string,
...
*) I am using listagg window function and not the aggregate function, as apparently listagg aggregate function can only be used with user defined tables. Bummer.
I had been doing something similar to #botchniaque's answer in the past, but recently stumbled across a solution in the AWS-Labs' amazon-redshift-utils code package that seems to be more reliable than my hand-spun queries:
amazon-redshift-utils: v_generate_external_tbl_ddl
If you don't have the ability to create a view backed with the ddl listed in that package, you can run it manually by removing the CREATE statement from the start of the query. Assuming you can create it as a view, usage would be:
SELECT ddl
FROM admin.v_generate_external_tbl_ddl
WHERE schemaname = '<external_schema_name>'
-- Optionally include specific table references:
-- AND tablename IN ('<table_name_1>', '<table_name_2>', ..., '<table_name_n>')
ORDER BY tablename, seq
;
They added show external table now.
SHOW EXTERNAL TABLE external_schema.table_name [ PARTITION ]
SHOW EXTERNAL TABLE my_schema.my_table;
https://docs.aws.amazon.com/redshift/latest/dg/r_SHOW_EXTERNAL_TABLE.html

Getting a Result Set's Column Names via T-SQL

Is there a way to get the column names that an arbitrary query will return using just T-SQL that works with pre-2012 versions of Microsoft SQL Server?
What Doesn't Work:
sys.columns and INFORMATION_SCHEMA.COLUMNS work great for obtaining the column list for tables or views but don't work with arbitrary queries.
sys.dm_exec_describe_first_result would be perfect except that this management function was added in SQL Server 2012. What I'm writing needs to be backwards compatible to SQL Server 2005.
A custom CLR function could easily provide this information but introduces deployment complexities on the server side. I'd rather not go this route.
Any ideas?
So long as the arbitrary query qualifies to be used as a nested query (i.e. no CTEs, unique column names, etc.), this can be achieved by loading the query's metadata into a temp table, then retrieving column details via sys.tables:
SELECT TOP 0 * INTO #t FROM (query goes here) q
SELECT name FROM tempdb.sys.columns WHERE object_id = OBJECT_ID('tempdb..#t')
DROP TABLE #t
Thanks to #MartinSmith's for suggesting this approach!