Search through all columns of data - postgresql

I have a table, for example Products, and it has 10 columns. There are also other tables, which are related to Products.
For example:
Products <-> Keywords (M-M)
Products <-> Warehouse (1-M)
etc
I need to implement a search on all columns, as well as on the columns in the relationship Keywords and Warehouse.
I have a trivial SQL
FROM Product AS A
LEFT JOIN (...Keywords...) AS B
LEFT JOIN (...Warehouse...) AS C
WHERE "A.Name" LIKE '%SOME TEXT%' OR "A.RegistrationDate" LIKE '%SOME TEXT%' etc.
I.e. the user enters a string and needs to search through all columns of the table (they contain int, datetme, text types) using this string.
I think my implementation is very naive and not really optimized.
Is there any other way to search string in all columns?
Also I had an idea to put all this in to_vector and search through full text search, Is that a good idea? I just have DateTime and Int types in my columns

A brute force method would be to convert the entire row to a string and then search that string:
SELECT ...
FROM Product AS a
LEFT JOIN (...Keywords...) AS B
LEFT JOIN (...Warehouse...) AS C
WHERE a::text LIKE '%SOME TEXT%'
Note that casting a row to a text value will separate the column values with commas and enclose it with parentheses, e.g. a row with the values 1, Donald, Duck would become '(1,Donald,Duck)'
So if you are looking for commas or parentheses you might get false positives.
Another options that is more accurate but a bit more complicated, is to turn the row into JSON, then iterate over all JSON values:
SELECT ...
FROM Product AS a
LEFT JOIN (...Keywords...) AS B
LEFT JOIN (...Warehouse...) AS C
WHERE EXISTS (select *
from jsonb_each_text(to_jsonb(a)) as x(col,value)
where x.value LIKE '%SOME TEXT%')
Or you can use a JSON path query instead:
SELECT ...
FROM Product AS a
LEFT JOIN (...Keywords...) AS B
LEFT JOIN (...Warehouse...) AS C
WHERE jsonb_path_exists(to_jsonb(a), '$.* ? (# like_regex "SOME TEXT")')

Related

It's a function or table in this later join case?

It is often particularly handy to LEFT JOIN to a LATERAL subquery, so that source rows will appear in the result even if the LATERAL subquery produces no rows for them. For example, if get_product_names() returns the names of products made by a manufacturer, but some manufacturers in our table currently produce no products, we could find out which ones those are like this:
SELECT m.name
FROM manufacturers m LEFT JOIN LATERAL get_product_names(m.id) pname ON true
WHERE pname IS NULL;
All contents extract from PostgreSQL manual. LINK
Now I finally probably get what does LATERAL mean. In this case,
Overall I am Not sure get_product_names is a table or function. The following is my understanding.
A: get_product_names(m.id) is a function, and using m.id as a input parameter returns a table. The return table alias as pname. Overall it's a table m join a null (where condition) table.
B: get_product_names is a table, table m left join table get_product_names on m.id. pname is alias for get_product_names. Overall it's a table m join a null (where condition) table.
get_product_names is a table function (also known as set returning function or SRF in PostgreSQL slang). Such a function does not necessarily return a single result row, but arbitrarily many rows.
Since the result of such a function is a table, you typically use it in SQL statements where you would use a table, that is in the FROM clause.
A simple example is
SELECT * FROM generate_series(1, 5);
generate_series
-----------------
1
2
3
4
5
(5 rows)
You can also use normal functions in this way, they are then treated as a table function that returns exactly one row.

Hive join tables on string field match

Hi I am trying to Left outer join on table onto another, the matching colums are type String.
Will Hive join on matching string columns or do they need to be converted to a different datatype?
My join ON Clause looks like
Select table1.para1, table2.para2
From table a
left outer Join Table b
On (table1.a=table2.b). A and B are strings, will this work?
Join on string will work. Just keep in mind that string matching will be case-sensitive.
Consider using functions like UPPER or LOWER.
For example,
select * from temp join new_temp on LOWER(temp.dept) = LOWER(new_temp.dept);

SQL with table as becomes ambiguous

Perhaps I'm approaching this all wrong, in which case feel free to point out a better way to solve the overall question, which "How do I use an intermediate table for future queries?"
Let's say I've got tables foo and bar, which join on some baz_id, and I want to use combine this into an intermediate table to be fed into upcoming queries. I know of the WITH .. AS (...) statement, but am running into problems as such:
WITH foobar AS (
SELECT *
FROM foo
INNER JOIN bar ON bar.baz_id = foo.baz_id
)
SELECT
baz_id
-- some other things as well
FROM
foobar
The issue is that (Postgres 9.4) tells me baz_id is ambiguous. I understand this happens because SELECT * includes all the columns in both tables, so baz_id shows up twice; but I'm not sure how to get around it. I was hoping to avoid copying the column names out individually, like
SELECT
foo.var1, foo.var2, foo.var3, ...
bar.other1, bar.other2, bar.other3, ...
FROM foo INNER JOIN bar ...
because there are hundreds of columns in these tables.
Is there some way around this I'm missing, or some altogether different way to approach the question at hand?
WITH foobar AS (
SELECT *
FROM foo
INNER JOIN bar USING(baz_id)
)
SELECT
baz_id
-- some other things as well
FROM
foobar
It leaves only one instance of the baz_id column in the select list.
From the documentation:
The USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining T1 and T2 with USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b = T2.b.
Furthermore, the output of JOIN USING suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. While JOIN ON produces all columns from T1 followed by all columns from T2, JOIN USING produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns from T1, followed by any remaining columns from T2.

PostgreSQL Full Text search

I need to use Full Text Search with Postgresql but I don't find the way to look for a list of words from a table (using ts_query) against an indexed text field (ts_vector data type). Is ts_query just able to process a few words or can process also multiple values that come from a table?
Thanks in advance for your help.
Let me try to formulate an answer according to the comments given on the question (if I understand your request correctly).
Problem
You are trying to do a full text search on the table tableA, column indexed_text_field (a tsvector type) based on words that are stored as text in another table tableB in a column called words.
Solution
First, if you wish to feed PostgreSQL multiple tokens (individual words) during a full text search you have two functions at your disposal:
to_tsquery()
plainto_tsquery()
In the first function you need to split each given token with an ampersand (&). The second function can be fed any string of text and it will chop it into tokens for you. More info here.
Your challenge is that you wish to select matches based on words present in another table. This can be done in different ways, for example via a simple (INNER) JOIN:
SELECT a.* FROM tableA a, tableB b WHERE a.indexed_text_field ## to_tsquery(b.words);
Or if you have multiple words in the words column you should most likely be using the plainto_tsquery() function to keep things simple:
SELECT a.* FROM tableA a, tableB b WHERE a.indexed_text_field ## plainto_tsquery(b.words);
Yet, if you must use the more low-level to_tsquery() version:
SELECT a.* FROM tableA a, tableB b WHERE a.indexed_text_field ## to_tsquery(replace(b.words, ' ', '&'));
In the latter you replace all spaces between the words with and ampersand, thus making them separate tokens. Mind the index usage on the last one though, as you might need to create an expression index on the usage of the replace() function.

differing column names in self-outer-joins

When writing a self-join in tSQL I can avoid duplicate column names thus:
SELECT FirstEvent.Title AS FirstTitle, SecondEvent.Title AS FirstTitle
FROM ContiguatedEvents AS FirstEvent
LEFT OUTER JOIN ContiguatedEvents AS SecondEvent
ON FirstEvent.logID = SecondEvent.logID
Suppose I want to select all the columns from the self-join, for example into a view. How do I then differentiate the column names without writing each one out in the join statement. I.e. is there anything I can write like this (ish)
SELECT FirstEvent.* AS ???, SecondEvent.* AS ???
FROM ContiguatedEvents AS FirstEvent
LEFT OUTER JOIN ContiguatedEvents AS SecondEvent
ON FirstEvent.logID = SecondEvent.logID
There's no way to automatically introduce aliases for multiple columns, you just have to do it by hand.
One handy hint for quickly getting all of the column names into your query (in management studio) is to drag the Columns folder from the Object Explorer into a query window. It gives you all of the column names.