Can we write a hive query in Spark - UDF - scala

Can we write a hive query in Spark - UDF.
eg I have 2 tables:
Table A and B
where b1 contains column names of A and b2 contains the value of that column in A.
Now I want to query the tables in such a way that I get result as below:
Result.
Basically replace the values of column in A with B based on column names and their corresponding values.
To achieve that I wrote spark-UDF eg:convert as below
def convert(colname: String, colvalue:String)={
sqlContext.sql("SELECT b3 from B where b1 = colname and b2 = colvalue").toString;
}
I registered it as:
sqlContext.udf.register("conv",convert(_:String,_:String));
Now my main query is-
val result = sqlContext.sql("select a1 , conv('a2',a2), conv('a3',a3)");
result.take(2);
It gives me java.lang.NullPointerException.
Can someone please suggest if this feature is supported in spark/hive.
Any other approach is also welcome.
Thanks!

No, UDF Doesn't permit to write a Query inside.
You can only pass the data as variables and do transformation to get the final result back at row/column/table level.

Here is the solution to your question. You can do it in Hive itself.
WITH a_plus_col
AS (SELECT a1
,'a2' AS col_name
,a2 AS col_value
FROM A
UNION ALL
SELECT a1
,'a3' AS col_name
,a3 AS col_value
FROM A)
SELECT a_plus_col.a1 AS r1
,MAX(CASE WHEN a_plus_col.col_name = 'a2' THEN B.b3 END) AS r2
,MAX(CASE WHEN a_plus_col.col_name = 'a3' THEN B.b3 END) AS r3
FROM a_plus_col
INNER JOIN B ON ( a_plus_col.col_name = b1 AND a_plus_col.col_value = b2)
GROUP BY a_plus_col.a1;

Related

informix 14.10 How to "select" returns a specific phrase such as None or blank instead of no result

I have a query like this:
select c1 , ( select d1 from table2 where dt) from table1 where ct
but if there is no d1 under condition dt i have no result but i have a reult like this:
--c1 d1
value1 NONE or Blank
value 2 NONE or Blank
. .
. .
Can anybody Help?
The NVL function can be used to return either of its two arguments depending on whether the first evaluates to NULL. So your example query could be written as:
select c1 , NVL(( select d1 from table2 where dt), "NONE") from table1 where ct
The data types of the two arguments need to be compatible, for example both character or both numeric.
More information can be found at https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1445.htm

Lag Function to populate missing rows

I am trying to come up with a SQL to get to this data in a Postgres 9.6 database table.
Table Data
I have tried various variations of windows function but none of these seems to work,
Based on input column C3, I am projecting a fourth column C4 and the output should resemble as below.
Final Desired output
How can I accomplish this using SQL? The table can have up to 100 Million records.
I was able to get desired output using the following SQL
with t2 as (select c1, c3 from Test_table where c3 is not null)
update Test_table t1
set c3 = t2.c3
from t2
where t1.c1 <= t2.c1
and t1.c3 is null;
select
c1
,c2
,C3
,dense_rank() over(order by c3) cr
from Test_table
order by c1;
Use a window function to select the smallest c3 in a window ordered by c1 descending, but sort the whole output by c1 ascending:
select c1, c2, c3, min(c3) over (order by c1 desc) as c4 from t order by c1;
I ran the SQL provided by you and I get this output. I thought the picture described what I really wanted. Hopefully, showing your output from running your SQL and desired output may help.
SQL output from your query and desired output

Inner join within update statement in PostgreSQL

I have table called temp_table which consist of following rows:
cola colb result
----------------
p4 s1 0
p8 s1 0
p9 s1 0
p5 f1 0
p8 f1 0
Now I need to update result column with the count(*) of colb. For which i am trying the following query:
update tem_table
set result = x.result
from tem_table tt
inner join(select colb,count(*) as result from tem_table group by colb) x
on x.colb = tt.colb;
And selecting distinct colb and result from temp_table:
select distinct colb,result from tem_table;
Getting output:
colb result
-----------
s1 3
f1 3
But the expected output is:
colb result
-----------
s1 3
f1 2
I am not getting where I am getting wrong in my query? Please help me.Thanks
You should not repeat the table to be updated in the from clause. This will create a cartesian self join.
Quote from the manual:
Note that the target table must not appear in the from_list, unless you intend a self-join (in which case it must appear with an alias in the from_list)
(Emphasis mine)
Unfortunately UPDATE does not support explicit joins using the JOIN keyword. Something like this should work:
update tem_table
set result = x.result
from (
select colb,count(*) as result
from tem_table
group by colb
) x
where x.colb = tem_table.colb;

Common records for 2 fields in a table?

I have a Table which has 2 fields say A,B. Suppose A has values a1,a2.
Corresponding records for a1 in B are 1,2,3,x,y,z.
Corresponding records for a2 in B are 1,2,3,4,d,e,f
I need a a query to be written in DB2, so that it will fetch the common records in B for each record in A (a1 and a2).
So here the output would be :
A B
a1 1
a1 2
a1 3
a2 1
a2 2
a2 3
Can someone please help on this?
Try something like:
SELECT A, B
FROM Table t1
WHERE (SELECT COUNT(*) FROM Table t2 WHERE t2.B = t1.B)
= (SELECT COUNT(DISTINCT t3.A) FROM Table t3)
ORDER BY A, B
This might not be 100% accurate as I can't test it out in DB2 so you might have to tweak the query a little bit to make it work.
with t(num) as (select count(distinct A) from table)
select t1.A, t1.B
from table t1, table t2, t
where t1.B = t2.B
group by t1.A, t1.B, num
having count(*) = num
Basically, the idea is to join the same table with column B and filter out just the ones that match exactly the same number of times as the number of elements in column A, which indicates that it is a common record out of all the A values.

jdbc data comparision

I want to compare content of 15 columns in two rows.
I am using db2 9 with jdbc.
Can I use a sql to get a result like "match or not match"
And How can I get columns differs?
You can use the EXCEPT operator to do this.
In the example below, I'm using common table expressions to fetch a the single rows (assuming, in this case, that id is the primary key.
with r1 as (select c1, c2, ..., c15 from t where id = 1),
r2 as (select c1, c2, ..., c15 from t where id = 2)
select * from r1
except
select * from r2
If this returns 0 rows, then the rows are identical. If it returns a row, then the two rows differ.
If you really want the result to be 'MATCH' or 'NOT MATCH':
with r1 as (select c1, c2, ..., c15 from t where id = 1),
r2 as (select c1, c2, ..., c15 from t where id = 2),
rs as (select * from r1 except select * from r2)
select
case when count(*) = 0 then 'MATCH'
else 'NOT MATCH'
end as comparison
from
rs;