Hive join tables on string field match - hiveql

Hi I am trying to Left outer join on table onto another, the matching colums are type String.
Will Hive join on matching string columns or do they need to be converted to a different datatype?
My join ON Clause looks like
Select table1.para1, table2.para2
From table a
left outer Join Table b
On (table1.a=table2.b). A and B are strings, will this work?

Join on string will work. Just keep in mind that string matching will be case-sensitive.
Consider using functions like UPPER or LOWER.
For example,
select * from temp join new_temp on LOWER(temp.dept) = LOWER(new_temp.dept);

Related

Search through all columns of data

I have a table, for example Products, and it has 10 columns. There are also other tables, which are related to Products.
For example:
Products <-> Keywords (M-M)
Products <-> Warehouse (1-M)
etc
I need to implement a search on all columns, as well as on the columns in the relationship Keywords and Warehouse.
I have a trivial SQL
FROM Product AS A
LEFT JOIN (...Keywords...) AS B
LEFT JOIN (...Warehouse...) AS C
WHERE "A.Name" LIKE '%SOME TEXT%' OR "A.RegistrationDate" LIKE '%SOME TEXT%' etc.
I.e. the user enters a string and needs to search through all columns of the table (they contain int, datetme, text types) using this string.
I think my implementation is very naive and not really optimized.
Is there any other way to search string in all columns?
Also I had an idea to put all this in to_vector and search through full text search, Is that a good idea? I just have DateTime and Int types in my columns
A brute force method would be to convert the entire row to a string and then search that string:
SELECT ...
FROM Product AS a
LEFT JOIN (...Keywords...) AS B
LEFT JOIN (...Warehouse...) AS C
WHERE a::text LIKE '%SOME TEXT%'
Note that casting a row to a text value will separate the column values with commas and enclose it with parentheses, e.g. a row with the values 1, Donald, Duck would become '(1,Donald,Duck)'
So if you are looking for commas or parentheses you might get false positives.
Another options that is more accurate but a bit more complicated, is to turn the row into JSON, then iterate over all JSON values:
SELECT ...
FROM Product AS a
LEFT JOIN (...Keywords...) AS B
LEFT JOIN (...Warehouse...) AS C
WHERE EXISTS (select *
from jsonb_each_text(to_jsonb(a)) as x(col,value)
where x.value LIKE '%SOME TEXT%')
Or you can use a JSON path query instead:
SELECT ...
FROM Product AS a
LEFT JOIN (...Keywords...) AS B
LEFT JOIN (...Warehouse...) AS C
WHERE jsonb_path_exists(to_jsonb(a), '$.* ? (# like_regex "SOME TEXT")')

UNION types double precision and character varying cannot be matched

I am trying to run a program from my following database
DATABASE
But i am getting the error "UNION types double precision and character varying cannot be matched" every time I try to run the program
My code is
SELECT
EXTRACT(YEAR
FROM AGE(users.dob)) AS Age,
users.gender,
app.name AS App
FROM users
LEFT OUTER JOIN app_user_profile
ON users.id = app_user_profile.users_id
LEFT OUTER JOIN app
ON app_user_profile.app_id = app.id
UNION
SELECT cities.name AS City, provinces.name AS Province, countries.name AS Country
FROM cities
LEFT OUTER JOIN provinces
ON cities.country_id = provinces.country_id
LEFT OUTER JOIN countries
ON provinces.country_id = countries.country_id
UNIONs in PostgreSQL must have the same number of columns and the columns must be of "compatible data types".
Judging from the names of your columns, these seem to be totally different things that you want to UNION. Which is unusual, but it can be done. The biggest problem seems to be between Age and City which are a number and text/varchar, respectively. If you cast Age as text, the UNION should work.
SELECT
EXTRACT(YEAR
FROM AGE(users.dob))::text AS Age,
....
Type casting can either be done with Column::<new type> or Cast(Column, <new type>). The first version is a custom PostgreSQL syntax and easy to type, and the second version conforms to SQL standards.

Postgresql to BigQuery - Left Join on X and Y

I have a table with a column (value) that holds different types of information that I need to parse into separate columns. In postgresql, I can easily do this:
SELECT m1.value shipname
, m2.value agent
FROM maritimeDB m1
JOIN maritimeDB m2
ON m1.rowID = m2.rowID
AND m2.itemname = 'Agent'
WHERE m1.rowID
IN (SELECT DISTINCT rowID FROM maritimeDB WHERE entity='9999')
AND m1.itemname='shipname'
I want to do this same sort of query in BigQuery (with JOIN becoming LEFT JOIN), but I get this error:
Error: ON clause must be AND of = comparisons of one field name from each table, with all field names prefixed with table name.
Any suggestions?
This error is coming from Legacy SQL dialect (which is default). This query should work with Standard SQL dialect which supports arbitrary JOIN predicates.

SQL with table as becomes ambiguous

Perhaps I'm approaching this all wrong, in which case feel free to point out a better way to solve the overall question, which "How do I use an intermediate table for future queries?"
Let's say I've got tables foo and bar, which join on some baz_id, and I want to use combine this into an intermediate table to be fed into upcoming queries. I know of the WITH .. AS (...) statement, but am running into problems as such:
WITH foobar AS (
SELECT *
FROM foo
INNER JOIN bar ON bar.baz_id = foo.baz_id
)
SELECT
baz_id
-- some other things as well
FROM
foobar
The issue is that (Postgres 9.4) tells me baz_id is ambiguous. I understand this happens because SELECT * includes all the columns in both tables, so baz_id shows up twice; but I'm not sure how to get around it. I was hoping to avoid copying the column names out individually, like
SELECT
foo.var1, foo.var2, foo.var3, ...
bar.other1, bar.other2, bar.other3, ...
FROM foo INNER JOIN bar ...
because there are hundreds of columns in these tables.
Is there some way around this I'm missing, or some altogether different way to approach the question at hand?
WITH foobar AS (
SELECT *
FROM foo
INNER JOIN bar USING(baz_id)
)
SELECT
baz_id
-- some other things as well
FROM
foobar
It leaves only one instance of the baz_id column in the select list.
From the documentation:
The USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining T1 and T2 with USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b = T2.b.
Furthermore, the output of JOIN USING suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. While JOIN ON produces all columns from T1 followed by all columns from T2, JOIN USING produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns from T1, followed by any remaining columns from T2.

differing column names in self-outer-joins

When writing a self-join in tSQL I can avoid duplicate column names thus:
SELECT FirstEvent.Title AS FirstTitle, SecondEvent.Title AS FirstTitle
FROM ContiguatedEvents AS FirstEvent
LEFT OUTER JOIN ContiguatedEvents AS SecondEvent
ON FirstEvent.logID = SecondEvent.logID
Suppose I want to select all the columns from the self-join, for example into a view. How do I then differentiate the column names without writing each one out in the join statement. I.e. is there anything I can write like this (ish)
SELECT FirstEvent.* AS ???, SecondEvent.* AS ???
FROM ContiguatedEvents AS FirstEvent
LEFT OUTER JOIN ContiguatedEvents AS SecondEvent
ON FirstEvent.logID = SecondEvent.logID
There's no way to automatically introduce aliases for multiple columns, you just have to do it by hand.
One handy hint for quickly getting all of the column names into your query (in management studio) is to drag the Columns folder from the Object Explorer into a query window. It gives you all of the column names.