HiveQL - How to tackle with elements not appearing in dictionary - hiveql

The thing is: I've got this lookup table which I use as a dictionary to create a new column that 'translates' the meaning of a certain column of codes.
Let's say:
Table1:
ID Code
01 A
02 B
03 C
Lookup_table (dictionary):
Code Meaning
A Alice
B Bob
C Charlie
I can easily make a JOIN to create a new table (Table2) with the new column 'Meaning' added to it:
Table2:
CREATE TABLE Table2 AS SELECT T1.ID, T1.Code, LKP.Meaning
FROM Table1 AS T1
LEFT OUTER JOIN Lookup_table AS LKP
ON (T1.Code = LKP.Code);
But: What to do if a new Code appears in Table1 (e.g. ("04", "D") ) and there is no translation for it in Lookup_table? (given you want to avoid a NULL as an answer) Is there a way to obtain something like 'others' in Meaning to answer to this situation?
Thanks!

You could use COALESCE() in order to achieve that. COALESCE() takes two arguments, while selecting the first argument that is not NULL.
You can modify your query as follows:
CREATE TABLE Table2 AS
SELECT
T1.ID AS ID,
T1.Code AS Code,
COALESCE(LKP.Meaning,'others') AS Meaning
FROM Table1 AS T1
LEFT OUTER JOIN Lookup_table AS LKP
ON (T1.Code = LKP.Code);
In your case this would mean to put LKP.Meaning as first parameter. If this value is NULL, it will use 'others' as displayed.
See also the Hive Documentation.

Related

T-SQL select all IDs that have value A and B

I'm trying to find all IDs in TableA that are mentioned by a set of records in TableB and that set if defined in Table C. I've come so far to the point where a set of INNER JOIN provide me with the following result:
TableA.ID | TableB.Code
-----------------------
1 | A
1 | B
2 | A
3 | B
I want to select only the ID where in this case there is an entry for both A and B, but where the values A and B are based on another Query.
I figured this should be possible with a GROUP BY TableA.ID and HAVING = ALL(Subquery on table C).
But that is returning no values.
Since you did not post your original query, I will assume it is inside a CTE. Assuming this, the query you want is something along these lines:
SELECT ID
FROM cte
WHERE Code IN ('A', 'B')
GROUP BY ID
HAVING COUNT(DISTINCT Code) = 2;
It's an extremely poor question, but you you probably need to compare distinct counts against table C
SELECT a.ID
FROM TableA a
GROUP BY a.ID
HAVING COUNT(DISTINCT a.Code) = (SELECT COUNT(*) FROM TableC)
We're guessing though.

JPA query returning a Tuple where one part is an Entity

I have two unrelated tables that I want to do an LEFT JOIN on, I only want 1 column from the LEFT table but the entire entity (which I intend to update if its present or create if not) from the right.
Simplified version of tables:
TABLE1
id, type, data
TABLE2
id, type, and, other, stuff
Current JPQL:
SELECT T1.type,
(SELECT T2
FROM TABLE2 T2
WHERE T2.id = T1.id
AND T2.type = T1.type)
FROM T1
WHERE T1.id = :ID
I am currently getting some sort of logical union error...
Can this been done or should I just use separate queries?
The exact exception is:
Caused by: java.lang.ClassCastException: org.apache.openjpa.jdbc.sql.LogicalUnion$UnionSelect incompatible with org.apache.openjpa.jdbc.sql.SelectImpl
The Java code I use follows:
Query q = this.em.createQuery(jql, Tuple.class);
q.setParameter("ID", id);
#SuppressWarnings("unchecked")
List<Tuple> result = q.getResultList();
The subquery is not essential to my solution - it's just the only form that was parseable - a regular SQL LEFT JOIN wasn't. In words what I am trying to do is for a given ID in TABLE1 find all rows in TABLE2 that have the same ID and type or null if there is no row. Later code will create rows in TABLE2 where there are none for the id and type. I'm expecting 2-3 types per ID in TABLE1 and about half the time for a matching row in TABLE2.

How to join vertical and horizontal table together table

I have two table with one of them is vertical i.e store only key value pair with ref id from table 1. i want to join both table and dispaly key value pair as a column in select. and also perform sorting on few keys.
T1 having (id,empid,dpt)
T2 having (empid,key,value)
select
T1.*,
t21.value,
t22.value,
t23.value,
t24.value
from Table1 t1
join Table2 t21 on t1.empid = t21.empid
join Table2 t22 on t1.empid = t22.empid
join Table2 t23 on t1.empid = t23.empid
where
t21.key = 'FNAME'
and t22.key = 'LNAME'
and t23.key='AGE'
The query you demonstrate is very inefficient (another join for each additional column) and also has a potential problem: if there isn't a row in T2 for every key in the WHERE clause, the whole row is excluded.
The second problem can be avoided with LEFT [OUTER] JOIN instead of [INNER] JOIN. But don't bother, the solution to the first problem is a completely different query. "Pivot" T2 using crosstab() from the additional module tablefunc:
SELECT * FROM crosstab(
'SELECT empid, key, value FROM t2 ORDER BY 1'
, $$VALUES ('FNAME'), ('LNAME'), ('AGE')$$ -- more?
) AS ct (empid int -- use *actual* data types
, fname text
, lname text
, age text);
-- more?
Then just join to T1:
select *
from t1
JOIN (<insert query from above>) AS t2 USING (empid);
This time you may want to use [INNER] JOIN.
The USING clause conveniently removes the second instance of the empid column.
Detailed instructions:
PostgreSQL Crosstab Query

TSQL -- Where Statements on Multiple columns in Update

My basic question has to do with updating multiple columns at once from specified values in my query. The reason I want to do this is that I am updating my values from a ginormous table so I only want to query it once in order to reduce run time. Here is an example of an example select statement that returns the value I want for just one of the columns I need to update:
select a.Value
from Table1
left outer join
(
select ID, FilterCol1, FilterCol2, Value
from Table2
) a on a.ID = Table1.ID
where {Condition1a on FilterCol1}
and {Condition2a on FilterCol2}
In order to update multiple columns at once I would like to be able do something like this (but it returns NULL):
Update T1
set T1Value1 = (select a.Value where {Condition1a on FilterCol1}
and {Condition2a on FilterCol2)
,T1Value2 = (select a.Value where {Condition1b on FilterCol1}
and {Condition2b on FilterCol2})
from Table1 T1
left outer join
(
select ID, FilterCol1, FilterCol2, Value
from Table2
) a on a.ID = Table1.ID
Any help figuring this out would be greatly appreciated, let me know if you have any questions or if I made any errors. Thanks!
EDIT: I think I have identified the problem, but I'm not sure of a solution yet. I think seeing the issue requires a little more context: The select from table 2 is actually an unpivot on a wide table. This means that when the left outer join is applied, there will be multiple rows for a given ID. What the case statement that Earl suggested seems to be doing (and I assume this is happening with the where clause as well) is comparing my Conditions to only the first row of the columns from a. Since my conditions are meant to help determine which of the rows from a is chosen, they will always evaluate false for the first row (I know this just from what I know about the data), hence my perpetual NULL values. Does anyone know of a workaround to look at the other rows in a?
UPDATE T1
SET T1Value1 = CASE WHEN (FilterCol1 = Condition1a AND FilterCol2 = Condition2a) THEN a.Value END,
T1Value2 = CASE WHEN (FilterCol1 = Condition1b AND FilterCol2 = Condition2b) THEN a.Value END
FROM Table1 T1
left outer join
(
select ID, FilterCol1, FilterCol2, Value
) a on a.ID = Table1.ID

Join table variable vs join view

I have a stored procedure which is running quite slow. Therefore I want to extract some of the query in a separate view.
My code looks something like this:
DECLARE #tmpTable TABLE(..)
INSERT INTO #tmpTable (..) *query* (returns 3000 rows)
Select ... from table1
inner join table2
inner join table3
inner join #tmpTable
...
I then extract (copy-paste) the *query* and put it in a view - i.e. vView.
Doing this will then give me a different result:
Select ... from table1
inner join table2
inner join table3
inner join vView
...
Why? I can see that the vView and the #tmpTable both returns 3000 rows, so they should match (also did a except query to check).
Any comments would be much appriciated as I feel quite stuck with this..
EDITED:
This is the full query for getting the result (using #tmpTable or vView gives me different results, although the appear the same):
select dep.sid as depsid, dep.[name], COUNT(b.sid) as possiblelogins, count(ls.clientsid) as logins
from department dep
inner join relationship r on dep.sid=r.primarysid and r.relationshiptypeid=27 and r.validto is null
inner join [user] u on r.secondarysid=u.sid
inner join relationship r2 on u.sid=r2.secondarysid and r2.validto is null and r2.relationshiptypeid in (1,37)
inner join client c on r2.primarysid=c.sid
inner join ***#tmpTable or vView*** b on b.sid = c.sid
left outer join (select distinct clientsid from logonstatistics) as ls on b.sid=ls.clientsid
GROUP BY dep.sid, dep.[name],dep.isdepartment
HAVING dep.isdepartment=1
You maybe don't need the view/table if you change to this.
It joins on to client c and appears to be there only to JOIN onto logonstatistics
--remove inner join ***#tmpTable or vView*** b on b.sid = c.sid
--change JOIN
left outer join (select distinct clientsid from logonstatistics) as ls on c.sid=ls.clientsid
And change COUNT(b.sid) to COUNT(c.sid) in the SELECT clause
Otherwise, if you get different results you have two options I can see:
Table and view have different data. Have you run a line by line comparsion?
One has NULL, one has a value (especially for the sid column which will affect the JOIN)
Finally, when you says "different results" do you mean you get x2 or x3 rows? A different COUNT? What?