Include column names in the results of an unpivot query - tsql

We have this UNPIVOT query.
SELECT Value
FROM (Your select statement) as x
UNPIVOT (Value FOR val IN (a, b, c, d)) as p
That produces results like this:
Value
value1
value2
value3
value4
How can we elaborate to include the column names?
Value ColumnName
value1 a
value2 b
value3 c
value4 d

Thank you to Giorgi. Add val to the SELECT clause:
SELECT Value, val
FROM (Your select statement) as x
UNPIVOT (Value FOR val IN (a, b, c, d)) as p

Related

How to compare all fields in several rows in one table with result true or false (PostgreSQL)

I have such table (for example):
Field1
Field2
Field3
Field4
.....
1
a
c
c
1
a
x
c
1
a
c
c
2
a
y
j
2
b
y
k
2
b
y
l
I need to select by one field by one value and compare all fields in selected rows, like SELECT * WHERE Filed1=1.....COMPARE
I would like to have a result like:
Field1
Field2
Field3
Field4
.....
true
true
false
true
This should work for fixed columns and if there are no NULL values:
SELECT
COUNT(DISTINCT t.col1) = 1,
COUNT(DISTINCT t.col2) = 1,
COUNT(DISTINCT t.col3) = 1,
...
FROM mytable t
WHERE t.filter_column = 'some_value'
GROUP BY col1;
If you have some nullable columns, perhaps you could give it a try with something like this instead of the COUNT(DISTINCT t.<colname>) = 1:
BOOL_AND(NOT EXISTS(
SELECT 1
FROM mytable t2
WHERE t2.filter_column = 'some_value'
AND t2.<colname> IS DISTINCT FROM t.<colname>
))
If you do not have fixed columns, you should try to build up a dynamic query by a function taking as parameters the tablename, the name of the filter-column and the value for the filter.
Another remark: If you remove the filter (the condition t.filter_column = 'some_value') and add another output column as just t.filter_column, you should be able to recieve the result of this query for all distinct values in your filter-column.

DB2: SQL to return all rows in a group having a particular value of a column in two latest records of this group

I have a DB2 table having one of the columns (A) which has either value PQR or XYZ.
I need output where the latest two records based on col C date have value A = PQR.
Sample Table
A B C
--- ----- ----------
PQR Mark 08/08/2019
PQR Mark 08/01/2019
XYZ Mark 07/01/2019
PQR Joe 10/11/2019
XYZ Joe 10/01/2019
PQR Craig 06/06/2019
PQR Craig 06/20/2019
In this sample table, my output would be Mark and Craig records
Since 11.1
You may use the nth_value OLAP function.
Refer to OLAP specification.
SELECT A, B, C
FROM
(
SELECT
A, B, C
, NTH_VALUE (A, 1) OVER (PARTITION BY B ORDER BY C DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) C1
, NTH_VALUE (A, 2) OVER (PARTITION BY B ORDER BY C DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) C2
FROM TAB
)
WHERE C1 = 'PQR' AND C2 = 'PQR'
dbfiddle link.
Older versions
SELECT T.*
FROM TAB T
JOIN
(
SELECT B
FROM
(
SELECT
A, B
, ROWNUMBER() OVER (PARTITION BY B ORDER BY C DESC) RN
FROM TAB
)
WHERE RN IN (1, 2)
GROUP BY B
HAVING MIN(A) = MAX(A) AND COUNT(1) = 2 AND MIN(A) = 'PQR'
) G ON G.B = T.B;
A simple solution could be
SELECT A,B,C
FROM tab
WHERE A = 'PQR'
ORDER BY C DESC FETCH FIRST 2 ROWS only

Can we write a hive query in Spark - UDF

Can we write a hive query in Spark - UDF.
eg I have 2 tables:
Table A and B
where b1 contains column names of A and b2 contains the value of that column in A.
Now I want to query the tables in such a way that I get result as below:
Result.
Basically replace the values of column in A with B based on column names and their corresponding values.
To achieve that I wrote spark-UDF eg:convert as below
def convert(colname: String, colvalue:String)={
sqlContext.sql("SELECT b3 from B where b1 = colname and b2 = colvalue").toString;
}
I registered it as:
sqlContext.udf.register("conv",convert(_:String,_:String));
Now my main query is-
val result = sqlContext.sql("select a1 , conv('a2',a2), conv('a3',a3)");
result.take(2);
It gives me java.lang.NullPointerException.
Can someone please suggest if this feature is supported in spark/hive.
Any other approach is also welcome.
Thanks!
No, UDF Doesn't permit to write a Query inside.
You can only pass the data as variables and do transformation to get the final result back at row/column/table level.
Here is the solution to your question. You can do it in Hive itself.
WITH a_plus_col
AS (SELECT a1
,'a2' AS col_name
,a2 AS col_value
FROM A
UNION ALL
SELECT a1
,'a3' AS col_name
,a3 AS col_value
FROM A)
SELECT a_plus_col.a1 AS r1
,MAX(CASE WHEN a_plus_col.col_name = 'a2' THEN B.b3 END) AS r2
,MAX(CASE WHEN a_plus_col.col_name = 'a3' THEN B.b3 END) AS r3
FROM a_plus_col
INNER JOIN B ON ( a_plus_col.col_name = b1 AND a_plus_col.col_value = b2)
GROUP BY a_plus_col.a1;

Dedup using SQL on a huge 1 billion data set

I am having out of memory issues while trying to dedup a table consisting of huge amount of data.
Scenario :
Column A | Column B ( Date )
Value1 Date1
Value1 Date2
Value2 Date3
Value2 Date4
I need to dedup on both these columns, I need to pick the latest record using column b.
Lets say date2 and date4 are the latest dates. My output should be:
Column A | Column B ( Date )
Value1 Date2
Value2 Date4
Currently I am using the below query which works. Is there a better way of doing this using less memory.
CREATE TABLE UNIQUE_TABLENAME AS (
SELECT a.column a, a.column b, a.column c, a.column d
from tablename a,
(select column a,max(column b) from tablename group by column a)b
where a.column a = b.column a
and a.column b= b.column b)
Thanks in advance!
select distinct on (col_a)
col_a as value, col_b as "date"
from t
order by col_a, col_b desc
Check distinct on

Selecting an actual MIN value instead of NULL in SQL query

From this table:
Select * into #tmp from (
select 'a' A, 'b' B, NULL C union all
select 'aa' A, 'ab' B, 1 C union all
select 'aaa' A, 'bbb' B, 2 C ) x
I'd like to get this result:
A B Val
a b 1
aa ab 1
aaa bbb 2
That is, take the non-null min value and replace the NULL.
I suppose I could join the table to a filtered version of itself where no nulls appear. But that seems overkill. I thought this might be able to be done in the MIN Aggregate clause itself.
Any ideas would be helpful, thanks!
declare #null int
select #null = MIN(c) from #tmp
select A,
B,
ISNULL(c,#null) as val1
from #tmp
or
select A,
B,
ISNULL(c,(select MIN(c) from #tmp)) as val1
from #tmp
EDIT: I wrote "You want something like ISNULL(c, MIN(c)) but that's not possible."
But I'm wrong, it is possible. I was missing something in my syntax, so #kiki47's answer is exactly what you are asking for.
I wouldn't phrase it as "I suppose I could join the table to a filtered version of itself where no nulls appear," but more or less you can get the min and the use it.
In one go:
WITH cte AS (
SELECT MIN(c) minVal FROM #tmp WHERE c IS NOT NULL
)
SELECT a, b, ISNULL(c, cte.minVal)
FROM #tmp
CROSS JOIN cte
or maybe simpler (but may optimize to the same thing):
DECLARE #minVal INTEGER
SELECT #minVal = MIN(c) FROM #tmp WHERE c IS NOT NULL
SELECT a, b, ISNULL(c, #minVal) FROM #tmp