Let's say I have a table like this:
Task Type Variable Hours Duration
One A X 10 5
One A Y 40 15
One B X 100 29
Two A X 5 2
Two B X 15 9
Two A Y 60 17
Three A Y 18 5
Where the combination of task-type-variable makes each row unique.
How can I get a pivot table like the following:
X Y
One A Hours 10 40
Duration 5 15
One B Hours 100 0
Duration 29 0
Two A Hours 5 60
Duration 2 17
Two B Hours 15 0
Duration 9 0
Three A Hours 0 18
Duration 0 5
Is this even possible in SQL? I know Excel can do this.
This is a really an UNPIVOT and a PIVOT. The following code achieves the desired results in a single query.
DECLARE #t TABLE (
Task varchar(5),
Type char(1),
Variable char(1),
Hours int,
Duration int
)
INSERT INTO #t
VALUES
('One', 'A', 'X', 10, 5),
('One', 'A', 'Y', 40, 15),
('One', 'B', 'X', 100, 29),
('Two', 'A', 'X', 5, 2),
('Two', 'B', 'X', 15, 9),
('Two', 'A', 'Y', 60, 17),
('Three', 'A', 'Y', 18, 5)
SELECT
P.Task,
P.Type,
CAST(P.Property AS varchar(8)) AS Property,
COALESCE(P.X, 0) AS X,
COALESCE(P.Y, 0) AS Y
FROM #t AS T
UNPIVOT (
Value FOR Property IN (
Hours,
Duration
)
) AS U
PIVOT (
SUM(Value) FOR Variable IN (
X,
Y
)
) AS P
This yields the following results.
Task Type Property X Y
----- ---- -------- ----------- -----------
One A Duration 5 15
One A Hours 10 40
One B Duration 29 0
One B Hours 100 0
Three A Duration 0 5
Three A Hours 0 18
Two A Duration 2 17
Two A Hours 5 60
Two B Duration 9 0
Two B Hours 15 0
As you can see, the Hours and Duration are flipped. I don't think there is any way to force an order using PIVOT alone. This could easily be remedied by joining to another table with the Property value with an associated sort order, as long as you had some other way to ensure the other columns sorted correctly first.
Related
I have three columns in postgresql
No
total_car_sales
start_date
end_date
1
5
Jan-01-2022
Aug-03-2022
2
1
April-01-2022
July-03-2022
3
3
March-01-2022
May-03-2022
4
7
Jan-01-2022
July-03-2022
5
56
April-01-2022
April-25-2022
6
3
April-01-2022
Aug-04-2022
Here example from start_date No.1: 'Jan-01-2022' to 'August-03-2022': I will count only for August-2022 so the result for August-2022 is 5.
No.6 the result Aug-2022 is 3.
Result I wanna generate total_car_sales for whole table like this:
Months
total_car_sales
Jan-2022
0
Feb-2022
0
March-2022
0
April-2022
56
May-2022
3
June-2022
0
July-2022
8
August-2022
8
I have tried to use trunc_cate() but it is not works for it
Any help for suggestion for me really appreciate it
Thank you
Make a list of months (generate_series) and calculate total sales for each of them.
with the_table (no,total_car_sales,start_date,end_date) as
(
values
(1, 5, 'Jan-01-2022'::date, 'Aug-03-2022'::date),
(2, 1, 'April-01-2022', 'July-03-2022'),
(3, 3, 'March-01-2022', 'May-03-2022'),
(4, 7, 'Jan-01-2022', 'July-03-2022'),
(5, 56, 'April-01-2022', 'April-25-2022'),
(6, 3, 'April-01-2022', 'Aug-04-2022')
)
select
to_char(m, 'mon-yyyy') "month",
coalesce
(
(select sum(total_car_sales) from the_table where m = date_trunc('month', end_date)),
0
) total_car_sales
from generate_series ('2022-01-01', '2022-08-01', interval '1 month') m;
I have table1 as below.
num
value
1
10
2
15
3
20
table2
ver
value
1.0
5
2.0
15
3.0
18
Output should be as below. I need to select all rows from table1 such that table1.value <= table2.value.
num
value
1
10
2
15
I tried this, it's not working.
select from table1 where value <= (exec value from table2)
From a logical point of view what you're asking kdb to compare is:
10 15 20<=5 15 18
Because these are equal lengths, kdb assumes you mean pairwise comparison, aka
10<=5
15<=15
20<=18
to which it would return
q)10 15 20<=5 15 18
010b
What you actually seem to mean (based on your expected output) is 10 15 20<=max(5 15 18). So in that case you would want:
q)t1:([]num:1 2 3;val:10 15 20)
q)t2:([]ver:1 2 3.;val:5 15 18)
q)select from t1 where val<=exec max val from t2
num val
-------
1 10
2 15
As an aside, you can't/shouldn't have a column called value as it clashes with a keyword
value is a keyword so don't assign to it.
Assuming you want all values from table1 with value less than the max value in table2 you could do:
q)table1:([]num:til 3;val:10 15 20)
q)table2:([]ver:`float$til 3;val:5 15 18)
q)select from table1 where val<=max table2`val
num val
-------
0 10
1 15
I'm trying to sum a window with a filter. I saw something similar to
sum(x) filter(condition) over (partition by...)
but it does not seem to work in t-sql, SQL Server 2017.
Essentially, I want to sum the last 5 rows that have a condition on another column.
I've tried
sum(case when condition...) over (partition...)
and sum(cast(nullif(x))) over (partition...).
I've tried left joining the table with a where condition to filter out the condition.
All of the above will add the last 5 from the starting point of the current row with the condition.
What I want is from the current row. Add the last 5 values above that meet a condition.
Date| Value | Condition | Result
1-1 10 1
1-2 11 1
1-3 12 1
1-4 13 1
1-5 14 0
1-6 15 1
1-7 16 0
1-8 17 0 sum(15+13+12+11+10)
1-9 18 1 sum(18+15+13+12+11)
1-10 19 1 sum(19+18+15+13+12)
In the above example the condition I would want would be 1, ignoring the 0 but still having the "window" size be 5 non-0 values.
This can easily be achieved using a correlated sub query:
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
[Date] Date,
[Value] int,
Condition bit
)
INSERT INTO #T ([Date], [Value], Condition) VALUES
('2019-01-01', 10, 1),
('2019-01-02', 11, 1),
('2019-01-03', 12, 1),
('2019-01-04', 13, 1),
('2019-01-05', 14, 0),
('2019-01-06', 15, 1),
('2019-01-07', 16, 0),
('2019-01-08', 17, 0),
('2019-01-09', 18, 1),
('2019-01-10', 19, 1)
The query:
SELECT [Date], [Value], Condition,
(
SELECT Sum([Value])
FROM
(
SELECT TOP 5 [Value]
FROM #T AS t1
WHERE Condition = 1
AND t1.[Date] <= t0.[Date]
-- If you want the sum to appear starting from a specific date, unremark the next row
--AND t0.[Date] > '2019-01-07'
ORDER BY [Date] DESC
) As t2
HAVING COUNT(*) = 5 -- there are at least 5 rows meeting the condition
) As Result
FROM #T As T0
Results:
Date Value Condition Result
2019-01-01 10 1
2019-01-02 11 1
2019-01-03 12 1
2019-01-04 13 1
2019-01-05 14 0
2019-01-06 15 1 61
2019-01-07 16 0 61
2019-01-08 17 0 61
2019-01-09 18 1 69
2019-01-10 19 1 77
Apologies if this is a simple thing to achieve but after reading several similar posts, I cannot seem to find the right answer.
What I am basically trying to do is replicate the functionality of calculating an average over a group of records.
Below is a quick bit of SQL to demonstrate what I want to get to.
DECLARE #T TABLE(CountryID int, CategoryID int, ProductID int, Price float)
INSERT INTO #T VALUES
(1,20, 300, 10),
(1,20, 301, 11),
(1,20, 302, 12),
(1,20, 303, 13),
(1,30, 300, 21),
(1,30, 300, 22),
(1,30, 300, 23),
(1,30, 300, 24),
(2,20, 300, 5),
(2,20, 301, 6),
(2,20, 302, 7),
(2,20, 303, 8),
(2,30, 300, 9),
(2,30, 300, 8),
(2,30, 300, 7),
(2,30, 300, 6)
SELECT
*
, AVG(Price) OVER(PARTITION BY CountryID, CategoryID) AS AvgPerCountryCategory
FROM #t
Which gives me the results I require ...
CountryID CategoryID ProductID Price AvgPerCountryCategory
1 20 300 10 11.5
1 20 301 11 11.5
1 20 302 12 11.5
1 20 303 13 11.5
1 30 300 21 22.5
1 30 300 22 22.5
1 30 300 23 22.5
1 30 300 24 22.5
2 20 300 5 6.5
2 20 301 6 6.5
2 20 302 7 6.5
2 20 303 8 6.5
2 30 300 9 7.5
2 30 300 8 7.5
2 30 300 7 7.5
2 30 300 6 7.5
As you can see each row now shows the average Price for the respective Country/Category. At a later stage this will be used to calculate a variance from this average, but for now I'd just like to get to this point and try to workout the next steps myself.
So what would bethe equivalent of AVG(Price) OVER(PARTITION BY CountryID, CategoryID) in DAX?
The plan is that the result will also take into account any filters that are applied to the data in Power BI. I'm not sure if this is important at this stage. However this does mean that doing this work in SQL is probably not an option.
I'm very new to DAX so an explanation any suggested expression would also be very wlecome.
You can create a new calculated column that gives you this as follows:
AvgPerCountryCategory =
CALCULATE(AVERAGE('#T'[Price]),
ALLEXCEPT('#T', '#T'[CountryID], '#T'[CategoryID]))
This is saying that we take the average over all rows where the CountryID and CategoryID match the ID values in the current row. (It removes all the row context except for those.)
This is equivalent to this version:
AvgPerCountryCategory =
CALCULATE(AVERAGE('#T'[Price]),
ALL('#T'[ProductID], '#T'[Price]))
This time we're telling it what row context to remove rather than what to keep.
Another way would be to remove all row context and then the parts you want back in explicitly:
AvgPerCountryCategory =
CALCULATE(AVERAGE('#T'[Price]),
ALL('#T'),
'#T'[CountryID] = EARLIER('#T'[CountryID]),
'#T'[CategoryID] = EARLIER('#T'[CategoryID]))
The EARLIER function refers to the earlier row context.
Edit:
The code above is written for calculated columns. For a measure, I'd recommend:
AvgPerCountryCategory =
CALCULATE (
AVERAGE ( '#T'[Price] ),
ALLSELECTED ( '#T' ),
SUMMARIZE (
'#T',
'#T'[CategoryID],
'#T'[CountryID]
)
)
Is there a way to detect subseries of zeros of length at least 3 within a time series in Postgres?
year value
--------------
1 0
2 0
3 0
4 33
5 72
6 0
7 0
8 0
9 0
10 25
11 0
12 56
13 37
So in this example I'd like to return years 1-3 and 6-9, but not year 11.
This one will do it:
WITH d(y,v) AS (VALUES
(1,0),(2,0),(3,0),(4,33),(5,72),
(6,0),(7,0),(8,0),(9,0),(10,25),
(11,0),(12,56),(13,37)
)
SELECT grp, numrange(min(y),max(y),'[]') as ys, count(*) as len
FROM (
/* group identifiers via running total */
SELECT y, v, g, sum(g) OVER (ORDER BY y) grp
FROM (
/* group boundaries */
SELECT y, v, CASE WHEN
v IS DISTINCT FROM lag(v) OVER (ORDER BY y) THEN 1
END g
FROM d) s
WHERE v=0) s
GROUP BY grp
HAVING count(*) >= 3;