Tsql query to find equal row values along columns - tsql

I've this table
col 1 col 2 col 3 .... col N
-------------------------------------
1 A B fooa
10 A foo cc
4 A B fooa
it is possible with a tsql query to return only one row with a value only where the values are ALL the same?
col 1 col 2 col 3 .... col N
-------------------------------------
-- A -- --

SELECT
CASE WHEN COUNT(col1) = COUNT(*) AND MIN(col1) = MAX(col1) THEN MIN(col1) END AS col1,
CASE WHEN COUNT(col2) = COUNT(*) AND MIN(col2) = MAX(col2) THEN MIN(col2) END AS col2,
...
FROM yourtable
You have to allow for NULLs in the column:
COUNT(*) counts them
COUNT(col1) doesn't count them
That is, a columns with a mix of As and NULLs isn't one value. MIN and MAX would both give A because they ignore NULLs.
Edit:
removed DISTINCT to get counts the same for NULL check
added MIN/MAX check (as per Mark Byers deleted answer) to check uniqueness

Related

Get the ID of a table and its modulo respect the total rows in the same table in Postgres

While trying to map some data to a table, I wanted to obtain the ID of a table and its modulo respect the total rows in the same table. For example, given this table:
id
--
1
3
10
12
I would like this result:
id | mod
---+----
1 | 1 <- 1 mod 4
3 | 3 <- 3 mod 4
10 | 2 <- 10 mod 4
12 | 0 <- 12 mod 4
Is there an easy way to achieve this dynamically (as in, not counting the rows on before hand or doing it in an atomic way)?
So far I've tried something like this:
SELECT t1.id, t1.id % COUNT(t1.id) mod FROM tbl t1, tbl t2 GROUP BY t1.id;
This works but you must have the GROUP BY and tbl t2 as otherwise it returns 0 for the mod column which makes sense because I think it works by multiplying the table by itself so each ID gets a full set of the table. I guess for small enough tables this is ok but I can see how this becomes problematic for larger tables.
Edit: Found another hack-ish way:
WITH total AS (
SELECT COUNT(*) cnt FROM tbl
)
SELECT t1.id, t1.id % t2.cnt mod FROM tbl t1, total t2
It similar to the previous query but it "collapses" the multiplication to a single row with the previous count.
You can use COUNT() window function:
SELECT id,
id % COUNT(*) OVER () mod
FROM tbl;
I'm sure that the optimizer is smart enough to calculate the result of the window function only once.
See the demo.

Spark use self reference in calculation for column

I have a data frame like this one given below. Essentially it is a time series derived data frame.
My issue is that the Formula for n-th Row Col C is :-
Col(C) = (Col A(nth row) - Col A(n-1 th row)) + Col C(n-1)th row.
Hence Calculation of Col C is self referencing a previous value of Col C. I am using spark sql, can some one please advise how to proceed with this? For the calculation of Col A I am using LAG function
It seems colC is just colA minus colA in the first row.
e.g.
1 = 6-5,
0 = 5-5,
2 = 7-5,
3 = 8-5,
-2 = 3-5
So this query should work:
SELECT colA, colA - FIRST(colA) OVER (ORDER BY id) AS colC
Your formula is a cumulative sum. Here is a complete example:
SELECT rowid, a, SUM(c0) OVER(ORDER BY rowid) as c
FROM
(
SELECT rowid, a, a - LAG(a, 1) OVER(ORDER BY rowid) as c0
FROM
(
SELECT 1 as rowid, 5 as a union all
SELECT 2 as rowid, 6 as a union all
SELECT 3 as rowid, 5 as a union all
SELECT 4 as rowid, 7 as a union all
SELECT 5 as rowid, 8 as a union all
SELECT 6 as rowid, 3 as a
)t
)t

TSQL Update value to 1 if Max value between two table else 0

I have two table
TABLE 1 : Stage_product
PRODUCT_ID SYS_ROWDATETIMEUTC
1 2015-03-13 06:09:30.040
2 ....
3
TABLE 2 : DIM_Product
PRODUCT_ID SYS_ROWSTARTDATETIMEUTC SYS_ROWISCURRENT
1 2014-03-13 06:09:30.040 0
2 2015-03-13 06:09:30.040 1
I want to do an update statement that if the value SYS_ROWDATETIMEUTC in the first table is more recent than the value SYS_ROWSTARTDATETIMEUTC in the table, then the value SYS_ROWISCURRENT in the second table is set to 0, else 1.
You can use the following query:
UPDATE t2
SET t2.SYS_ROWISCURRENT = CASE
WHEN t1.SYS_ROWDATETIMEUTC > t2.SYS_ROWSTARTDATETIMEUTC THEN 0
ELSE 1
END
FROM Table2 t2
INNER JOIN Table1 t1 ON t2.PRODUCT_ID = t1.PRODUCT_ID
I assume you want to compare dates between the two tables for the same product.

Postgres function to attach random integers to selected rows

I want a function or a trigger that when a row is inserted that all rows with matching criteria are given a random integer between 1 and the number of rows so to randomise the rows on a select.
E.g. if I have the data
Col1 Col2 Order
A 1
B 2
B 2
B 3
A 2
and I insert another row with Col1=B and Col2=2 then I want to end up with
Col1 Col2 Order
A 1
B 2 2
B 2 3
B 3
A 2
B 2 1
Where Order is a number with a value of 1 - with each number appearing only once?
There is no need to store this, you can generate such a number when you retrieve the data.
select col1,
col2,
row_number() over (partition by col1, col2 order by random()) as random_order
from the_table

Summing From Consecutive Rows

Assume we have a table and we want to do a sum of the Expend column so that the summation only adds up values of the same Week_Name.
SN Week_Name Exp Sum
-- --------- --- ---
1 Week 1 10 0
2 Week 1 20 0
3 Week 1 30 60
4 Week 2 40 0
5 Week 2 50 90
6 Week 3 10 0
I will assume we will need to `Order By' Week_Name, then compare the previous Week_Name(previous row) with the current row Week_name(Current row).
If both are the same, put zero in the SUM column.
If not the same, add all expenditure, where Week_Name = Week_Name(Previous row) and place in the Sum column. The final output should look like the table above.
Any help on how to achieve this in T-SQL is highly appreciated.
Okay, I was eventually able to resolve this issue, praise Jesus! If you want the exact table I gave above, you can use GilM's response below, it is perfect. If you want your table to have running Cumulatives, i.e. Rows 3 shoud have 60, Row 5, should have 150, Row 6 160 etc. Then, you can use my code below:
USE CAPdb
IF OBJECT_ID ('dbo.[tablebp]') IS NOT NULL
DROP TABLE [tablebp]
GO
CREATE TABLE [tablebp] (
tablebpcCol1 int PRIMARY KEY
,tabledatekey datetime
,tableweekname varchar(50)
,expenditure1 numeric
,expenditure_Cummulative numeric
)
INSERT INTO [tablebp](tablebpcCol1,tabledatekey,tableweekname,expenditure1,expenditure_Cummulative)
SELECT b.s_tablekey,d.PK_Date,d.Week_Name,
SUM(b.s_expenditure1) AS s_expenditure1,
SUM(b.s_expenditure1) + COALESCE((SELECT SUM(s_expenditure1)
FROM source_table bs JOIN dbo.Time dd ON bs.[DATE Key] = dd.[PK_Date]
WHERE dd.PK_Date < d.PK_Date),0)
FROM source_table b
INNER JOIN dbo.Time d ON b.[Date key] = d.PK_Date
GROUP BY d.[PK_Date],d.Week_Name,b.s_tablekey,b.s_expenditure1
ORDER BY d.[PK_Date]
;WITH CTE AS (
SELECT tableweekname
,Max(expenditure_Cummulative) AS Week_expenditure_Cummulative
,MAX(tablebpcCol1) AS MaxSN
FROM [tablebp]
GROUP BY tableweekname
)
SELECT [tablebp].*
,CASE WHEN [tablebp].tablebpcCol1 = CTE.MaxSN THEN Week_expenditure_Cummulative
ELSE 0 END AS [RunWeeklySum]
FROM [tablebp]
JOIN CTE on CTE.tableweekname = [tablebp].tableweekname
I'm not sure why your SN=6 line is 0 rather than 10. Do you really not want the sum for the last Week? If having the last week total is okay, then you might want something like:
;WITH CTE AS (
SELECT Week_Name,SUM([Expend.]) as SumExpend
,MAX(SN) AS MaxSN
FROM T
GROUP BY Week_Name
)
SELECT T.*,CASE WHEN T.SN = CTE.MaxSN THEN SumExpend
ELSE 0 END AS [Sum]
FROM T
JOIN CTE on CTE.Week_Name = T.Week_Name
Based on the requst in the comment wanting a running total in SUM you could try this:
;WITH CTE AS (
SELECT Week_Name, MAX(SN) AS MaxSN
FROM T
GROUP BY Week_Name
)
SELECT T.SN, T.Week_Name,T.Exp,
CASE WHEN T.SN = CTE.MaxSN THEN
(SELECT SUM(EXP) FROM T T2
WHERE T2.SN <= T.SN) ELSE 0 END AS [SUM]
FROM T
JOIN CTE ON CTE.Week_Name = T.Week_Name
ORDER BY SN