Input:
Name GroupId Processed NewGroupId NgId
Mike 1 N 9 NULL
Mikes 1 N 9 NULL
Miken 5 Y 9 5
Mikel 5 Y 9 5
Output:
Name GroupId Processed NewGroupId NgId
Mike 1 N 9 5
Mikes 1 N 9 5
Miken 5 Y 9 5
Mikel 5 Y 9 5
below query worked in sql server, due to correlated subquery same is not working in spark sql.
Is there any alternate either with spark sql or pyspark dataframe.
SELECT Name,groupid,IsProcessed,ngid,
CASE WHEN ngid IS NULL THEN
COALESCE((SELECT top 1 ngid FROM temp D
WHERE D.NewGroupId = T.NewGroupId AND
D.ngid IS NOT NULL ), null)
ELSE ngid
END AS ngid
FROM temp T
worked with below in sparksql.
spark.sql("select LKUP,groupid,IsProcessed,NewGroupId ,coalesce((select Max(D.ngid) from test2 D where D.NewGroupId = T.NewGroupId AND D.ngidis not null),null) as ngid from test2 T")
I am trying to write PIVOT to generate a row of data that originally sits as multiple rows in the DB. The DB data looks like this (appended)
txtSchoolID txtSubjectArchivedName intSubjectID intGradeID intGradeTransposeValue
95406288448 History 7 634 2
95406288448 History 7 635 2
95406288448 History 7 636 2
95406288448 History 7 637 2
95406288448 History 7 638 2
95406288448 History 7 639 2
95406288448 History 7 640 2
95406288448 History 7 641 2
95406288448 History 7 642 2
95406288448 History 7 643 2
What I want to get to is 1 row for each subject and SchoolID with the grades listed as columns.
I have written the following pivot:
SELECT intSubjectID, txtSchoolID, [636] AS Effort, [637] AS Focus, [638] AS Participation, [639] AS Groupwork, [640] AS Rigour, [641] AS Curiosity, [642] AS Initiative,
[643] AS SelfOrganisation, [644] as Perserverance
FROM (SELECT txtSchoolID, intReportTypeID, txtSubjectArchivedName, intSubjectID, intReportProgress, txtTitle, txtForename, txtPreName, txtMiddleNames,
txtSurname, txtGender, txtForm, intNCYear, txtSubmitByTitle, txtSubmitByPreName, txtSubmitByFirstname, txtSubmitByMiddleNames,
txtSubmitBySurname, txtCurrentSubjectName, txtCurrentSubjectReportName, intReportCycleID, txtReportCycleName, intReportCycleType,
intPreviousReportCycle, txtReportCycleShortName, intReportCycleTerm, intReportCycleAcademicYear, dtReportCycleStartDate,
dtReportCycleFinishDate, dtReportCyclePrintDate, txtReportTermName, dtReportTermStartDate, dtReportTermFinishDate,
intGradeID, txtGradingName, txtGradingOptions, txtShortGradingName, txtGrade, intGradeTransposeValue FROM VwReportsManagementAcademicReports) p
PIVOT
(MAX (intGradeTransposeValue)
FOR intGradeID IN ([636], [637], [638], [639], [640], [641], [642], [643], [644] )
) AS pvt
WHERE (intReportCycleID = 142) AND (intReportProgress = 1)
However, this is producing this
intSubjectID txtSchoolID Effort Focus Participation Groupwork Rigour Curiosity Initiative SelfOrganisation Perserverance
8 74001484142 NULL NULL NULL NULL NULL NULL NULL NULL NULL
8 74001484142 NULL NULL NULL NULL NULL 2 NULL NULL NULL
8 74001484142 3 NULL NULL NULL NULL NULL NULL NULL NULL
8 74001484142 NULL 2 NULL NULL NULL NULL NULL NULL NULL
8 74001484142 NULL NULL NULL 2 NULL NULL NULL NULL NULL
8 74001484142 NULL NULL NULL NULL NULL NULL 2 NULL NULL
8 74001484142 NULL NULL 2 NULL NULL NULL NULL NULL NULL
8 74001484142 NULL NULL NULL NULL NULL NULL NULL NULL 2
8 74001484142 NULL NULL NULL NULL 2 NULL NULL NULL NULL
8 74001484142 NULL NULL NULL NULL NULL NULL NULL 2 NULL
What I want is
intSubjectID txtSchoolID Effort Focus Participation Groupwork Rigour Curiosity Initiative SelfOrganisation Perserverance
8 74001484142 3 2 2 2 2 2 2 2 2
Is there a way to get it like this.
I have never tried a PIVOT before, this is my first time, so all help welcome.
I think the reason you got the unexpected result is you have so many unwanted columns in the Select in the sub-query and the pivot will group them, too.
Your query might be very close to your ideal result: try:
SELECT intSubjectID, txtSchoolID, [636] AS Effort, [637] AS Focus, [638] AS Participation, [639] AS Groupwork, [640] AS Rigour, [641] AS Curiosity, [642] AS Initiative,
[643] AS SelfOrganisation, [644] as Perserverance
FROM (SELECT txtSchoolID, intReportTypeID FROM VwReportsManagementAcademicReports) p --just these two
PIVOT
(MAX (intGradeTransposeValue)
FOR intGradeID IN ([636], [637], [638], [639], [640], [641], [642], [643], [644] )
) AS pvt
WHERE (intReportCycleID = 142) AND (intReportProgress = 1)
As per the comment above - the solution was:
try stripping the inner select down to only the columns that will be used in the pivot and are expected in the output. intSubjectID, txtSchoolID, intGradeTransposeValue, and intGradeID. all other columns will act as a grouping column in the output and can cause this type of non grouped output.
pivot can't return such what you asking for, but you can use another approach:
--test dataset
declare #test as table
( txtSchoolID bigint,
txtSubjectArchivedName varchar(10),
intSubjectID int,
intGradeID int,
intGradeTransposeValue int)
insert into #test
Values
(95406288448,'History',7,634,2),
(95406288448,'History',7,635,2),
(95406288448,'History',7,636,2),
(95406288448,'History',7,637,2),
(95406288448,'History',7,638,2),
(95406288448,'History',7,639,2),
(95406288448,'History',7,640,2),
(95406288448,'History',7,641,2),
(95406288448,'History',7,642,2),
(95406288448,'History',7,643,2)
--conditional aggregation
select intSubjectID,
txtSchoolID,
count(case when intGradeID = 636 then 1 end) AS Effort,
count(case when intGradeID = 637 then 1 end) AS Focus,
count(case when intGradeID = 638 then 1 end) AS Participation,
count(case when intGradeID = 639 then 1 end) AS Groupwork,
count(case when intGradeID = 640 then 1 end) AS Rigour,
count(case when intGradeID = 641 then 1 end) AS Curiosity,
count(case when intGradeID = 642 then 1 end) AS Initiative,
count(case when intGradeID = 643 then 1 end) AS SelfOrganisation,
count(case when intGradeID = 644 then 1 end) as Perserverance
from #test
group by intSubjectID, txtSchoolID
test is here
Suppose I have data formatted in the following way (FYI, total row count is over 30K):
customer_id order_date order_rank
A 2017-02-19 1
A 2017-02-24 2
A 2017-03-31 3
A 2017-07-03 4
A 2017-08-10 5
B 2016-04-24 1
B 2016-04-30 2
C 2016-07-18 1
C 2016-09-01 2
C 2016-09-13 3
I need a 4th column, let's call it days_since_last_order which, in the case where order_rank = 1 then 0 else calculate the number of days since the previous order (with rank n-1).
So, the above would return:
customer_id order_date order_rank days_since_last_order
A 2017-02-19 1 0
A 2017-02-24 2 5
A 2017-03-31 3 35
A 2017-07-03 4 94
A 2017-08-10 5 38
B 2016-04-24 1 0
B 2016-04-30 2 6
C 2016-07-18 1 79
C 2016-09-01 2 45
C 2016-09-13 3 12
Is there an easier way to calculate the above with a window function (or similar) rather than join the entire dataset against itself (eg. on A.order_rank = B.order_rank - 1) and doing the calc?
Thanks!
use the lag window function
SELECT
customer_id
, order_date
, order_rank
, COALESCE(
DATE(order_date)
- DATE(LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date))
, 0)
FROM <table_name>
I have one system that read from two client databases. For the two clients, both of them have different format of cut off date:
1) Client A: Every month at 15th. Example: 15-12-2016.
2) Client B: Every first day of the month. Example: 1-1-2017.
The cut off date are stored in the table as below:
Now I need a single query to retrieve the current month's cut off date of the client. For instance, today is 15-2-2017, so the expected cut off date for both clients should be as below:
1) Client A: 15-1-2017
2) Client B: 1-2-2017
How can I accomplish this in a single Stored Procedure? For client B, I can always get the first day of the month. But this can't apply to client A since their cut off is last month's date.
Might be something like this you are looking for:
DECLARE #DummyClient TABLE(ID INT IDENTITY,ClientName VARCHAR(100));
DECLARE #DummyDates TABLE(ClientID INT,YourDate DATE);
INSERT INTO #DummyClient VALUES
('A'),('B');
INSERT INTO #DummyDates VALUES
(1,{d'2016-12-15'}),(2,{d'2017-01-01'});
WITH Numbers AS
( SELECT 0 AS Nr
UNION ALL SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 9
UNION ALL SELECT 10
UNION ALL SELECT 11
UNION ALL SELECT 12
UNION ALL SELECT 13
UNION ALL SELECT 14
UNION ALL SELECT 15
UNION ALL SELECT 16
UNION ALL SELECT 17
UNION ALL SELECT 18
UNION ALL SELECT 19
UNION ALL SELECT 20
UNION ALL SELECT 21
UNION ALL SELECT 22
UNION ALL SELECT 23
UNION ALL SELECT 24
)
,ClientExt AS
(
SELECT c.*
,MIN(d.YourDate) AS MinDate
FROM #DummyClient AS c
INNER JOIN #DummyDates AS d ON c.ID=d.ClientID
GROUP BY c.ID,c.ClientName
)
SELECT ID,ClientName,D
FROM ClientExt
CROSS APPLY(SELECT DATEADD(MONTH,Numbers.Nr,MinDate)
FROM Numbers) AS RunningDate(D);
The result
ID Cl Date
1 A 2016-12-15
1 A 2017-01-15
1 A 2017-02-15
1 A 2017-03-15
1 A 2017-04-15
1 A 2017-05-15
1 A 2017-06-15
1 A 2017-07-15
1 A 2017-09-15
1 A 2017-10-15
1 A 2017-11-15
1 A 2017-12-15
1 A 2018-01-15
1 A 2018-02-15
1 A 2018-03-15
1 A 2018-04-15
1 A 2018-05-15
1 A 2018-06-15
1 A 2018-07-15
1 A 2018-08-15
1 A 2018-09-15
1 A 2018-10-15
1 A 2018-11-15
1 A 2018-12-15
2 B 2017-01-01
2 B 2017-02-01
2 B 2017-03-01
2 B 2017-04-01
2 B 2017-05-01
2 B 2017-06-01
2 B 2017-07-01
2 B 2017-08-01
2 B 2017-10-01
2 B 2017-11-01
2 B 2017-12-01
2 B 2018-01-01
2 B 2018-02-01
2 B 2018-03-01
2 B 2018-04-01
2 B 2018-05-01
2 B 2018-06-01
2 B 2018-07-01
2 B 2018-08-01
2 B 2018-09-01
2 B 2018-10-01
2 B 2018-11-01
2 B 2018-12-01
2 B 2019-01-01
Say I had an example like so, where Im transposing columns into rows with UNPIVOT.
DECLARE #pvt AS TABLE (VendorID int, Emp1 int, Emp2 int, Emp3 int, Emp4 int, Emp5 int);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (1,4,3,5,4,4);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (2,4,1,5,5,5);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (3,4,3,5,4,4);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (4,4,2,5,5,4);
INSERT INTO #pvt (VendorId,Emp1,Emp2,Emp3,Emp4,Emp5) VALUES (5,5,1,5,5,5);
--Unpivot the table.
SELECT VendorID, Employee, Orders
FROM
(SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
FROM #pvt) p
UNPIVOT
(Orders FOR Employee IN
(Emp1, Emp2, Emp3, Emp4, Emp5)
)AS unpvt;
GO
Which produces results like this
VendorID Employee Orders
1 Emp1 4
1 Emp2 3
1 Emp3 5
1 Emp4 4
1 Emp5 4
2 Emp1 4
2 Emp2 1
2 Emp3 5
2 Emp4 5
2 Emp5 5
3 Emp1 4
3 Emp2 3
3 Emp3 5
3 Emp4 4
3 Emp5 4
However, I want to include an "incremental date like so that it repeats in a group for each Vendor and the results would be like this
VendorID Employee Orders OrderDate
1 Emp1 4 01/01/2014
1 Emp2 3 02/01/2014
1 Emp3 5 03/01/2014
1 Emp4 4 04/01/2014
1 Emp5 4 05/01/2014
2 Emp1 4 ..
2 Emp2 1
2 Emp3 5
2 Emp4 5
2 Emp5 5
3 Emp1 4
3 Emp2 3
3 Emp3 5
3 Emp4 4
3 Emp5 4
The kicker is that I want to try to do this without resorting to a loop since the transposed results are going to be about 100K records. Is there a way to generate that date field like that without looping over the results?
[edit]
I think, but not sure yet, that [this]1 post might help, using ROW NUMBER
You can use:
Dateadd(DAY, row_number() over( partition by VendorId Order by Employee), #stardate)
According to your example you can partition by vendorId and order by Employee. But you can change just like a regular order by.