I have a data set with Student names, Skills, and their scores in these skills by year.
I want a query to find which student has had the highest growth in any skill? The period for growth can be 1-3 years (there are missing values for some years).
So, if there are records for 2000, 2001 and 2002 for a student and a skill, to calculate growth for 2002, we need to look at 2001.
If there were only records for 2000 and 2002 for a student and a skill, to calculate growth, we can look at 2000 (only if 2001 is not present).
I thought of doing a self join to create a basis to compare scores. Tried to create the growth period logic in this join condition but got stuck.
SELECT q1.STUDENT, q1.SKILL, q1.YEAR, q2.YEAR, q1.SCORE, q2.SCORE
FROM Table q1
INNER JOIN Table q2
ON q1.STUDENT = q2.STUDENT AND q1.SKILL = q2.SKILL AND ...
-- This is where I get stuck
(q1.YEAR = q2.YEAR - 1) -- Case 1
(q1.YEAR <> q2.YEAR - 1) AND (q1.YEAR = q2.YEAR - 2) -- Case 2
(q1.YEAR <> q2.YEAR - 1) AND (q1.YEAR <> q2.YEAR - 2) AND (q1.YEAR = q2.YEAR - 3) -- Case
I understand that these cases are kind of getting unioned right now? How do I make them run in an IF logic manner?
Sample Data:
Should they be three different queries unioned together instead?
You can use the lag window function to calculate the growth instead of self join.
with t as
(
select
student, skill,
case when year-lag(year) over w <= 3 then score-lag(score, 1) over w end as growth
from _table
window w as (partition by student, skill order by year)
)
select distinct on (skill) student, skill, growth
from t
order by skill, growth desc nulls last;
In the t CTE growth will be null for every first year in a (student, skill) group of records.
Related
I try to write a select query in my PostgreSQL database table containing history of customers settlement. The query result should show sum of amounts based only of customers, who are debtors (sum of all invoices amounts of each customer is gteater than zero). In attached example (picture below) - when we take detailed settlement history from date: 10.06.2021, You can see that invoices total amount of customer A is plus (+) 190000 so this customers sum should be taken to total sum. From the other side, invoices amount sum of customer B is minus (-)266000 so this one is not debtor and should be skipped. I try to make a sum containing only positive partial sums of each customer divided by customer status as shown on the screen below (Expected result).
I tried query like this:
select s.*, s.active+s.inactive total from
(select to_char(date_trunc('month', debt_date),'YYYY-MM'),
greatest(sum(case when t.status = 'Active' then t.amount::numeric else 0 end),0) active,
greatest(sum(case when t.status = 'Inactive' then t.amount::numeric else 0 end),0) inactive
from customers_settlement t
group by 1) s order by 1;
but it didn't work - manual calculation in Excel gave different results than the query. I guess that there is something missing like:
over (partition by customer)
I believe that professionals like You, will be able to help me quickly. Thank You in advance!
I am not totally certain if that is what you want, but you could first group by month and customer, then eliminate negative results, then sum again:
SELECT m,
sum(active) AS "sum(Active)",
sum(inactive) AS "sum(Inactive)",
sum(active) + sum(inactive) AS "sum(Total)"
FROM (SELECT to_char(date_trunc('month', debt_date),'YYYY-MM') AS m,
greatest(sum(t.amount) FILTER (WHERE t.status = 'Active'), 0) AS active,
greatest(sum(t.amount) FILTER (WHERE t.status = 'Inactive'), 0) AS inactive
FROM customers_settlement AS t
GROUP BY m, customer) AS subq
GROUP BY m;
I am trying to create a table with the average amount of sales divided by a cohort of users that signed up for an account in a certain month, however, I can only figure out to divide by the number of people that made a purchase in that specific month which is lower than the total amount of the cohort. How do I change the query below to make each of the avg_sucessful_transacted amounts divide by cohort 0 for each month?
thank you.
select sum (t.amount_in_dollars)/ count (distinct u.id) as Avg_Successful_Transacted, (datediff(month,[u.created:month],[t.createdon:month])) as Cohort, [u.created:month] as Months,
count (distinct u.id) as Users
from [transaction_cache as t]
left join [user_cache as u] on t.owner = u.id
where t.type = 'savings' and t.status = 'successful' and [u.created:year] > ['2017-01-01':date:year]
group by cohort, months
order by Cohort, Months
You will need to break out the cohort sizing into its own subquery or CTE in order to calculate the total number of distinct users who were created during the month which matches the cohort's basis month.
I approached this by bucketing users by the month they were created using the date_trunc('Month', <date>, <date>) function, but you may wish to approach it differently based on the specific business logic that generates your cohorts.
I don't work with Periscope, so the example query below is structured for pure Redshift, but hopefully it is easy to translate the syntax into Periscope's expected format:
WITH cohort_sizes AS (
SELECT date_trunc('Month', created)::DATE AS cohort_month
, COUNT(DISTINCT(id)) AS cohort_size
FROM user_cache u
GROUP BY 1
),
cohort_transactions AS (
SELECT date_trunc('Month', created)::DATE AS cohort_month
, createdon
, owner
, type
, status
, amount_in_dollars
, id
, created
FROM transaction_cache t
LEFT JOIN user_cache u ON t.owner = u.id
WHERE t.type = 'savings'
AND t.status = 'successful'
AND u.created > '2017-01-01'
)
SELECT SUM(t.amount_in_dollars) / s.cohort_size AS Avg_Successful_Transacted
, (datediff(MONTH, u.created, t.createdon)) AS Cohort
, u.created AS Months
, count(DISTINCT u.id) AS Users
FROM cohort_transactions t
JOIN cohort_sizes s ON t.cohort_month = s.cohort_month
LEFT JOIN user_cache AS u ON t.owner = u.id
GROUP BY s.cohort_size, Cohort, Months
ORDER BY Cohort, Months
;
[using postgres]
So the background to my scenario is:
I'm trying to get the average expenses by age where the user is active
cost = 40
Only 2 of the 33 year olds have purchased something
I have 34 active members that are 33 years old and active (whether or not they made a payment is irrelevant in this count)
with this in mind money spent per age = 40 / 34 = 1.18
what I am getting right now is = 40 / 2 = 20
I understand that it's constrained by the two users who made a purchase
So where did I get all of this?
select date_part('year', age(birthday)) as age,
avg(cost)
from person
inner join payment on person.person_id = payment.person_id
inner join product on payment.product_id = product.product_id
where
date_part('year', age(birthday))= 33 and user_state = 'active'
group by age
Unfortunately, when using an aggregate function (in this example avg())
it seems avg() is constrained to the result of the inner join (I've tried a left join to maintain having access to all users, it didn't seem to work since I still got the undesired result 20). Is there a way to avoid this? In other words can I make it so the avg() call is specific to my person table rather than the result of the join?
If it matters, this is how I am retrieving total sum.
select sum(cost)
from person
inner join payment on person.person_id = payment.person_id
inner join product on payment.product_id = product.product_id
where
date_part('year', age(birthday))= 33
and
user_state = 'active'
= 40
The obvious is to do the count of people I want and then do the sum seperately, but I'm trying to avoid going from one query to another.
avg will skip nulls so coalesce those null values into zeros and obviously left join:
select
date_part('year', age(birthday)) as age,
avg(coalesce(cost,0))
from
person
left join
payment on ...
left join
product on ...
I have a table (table1) has fact data. Let's say (products, start, end, value1, month[calculated column]) are the columns and start and end columns are timestamp.
What I am trying to have is a table and bar chart which give me sum of value1 for each month divided by a factor number according to each month (this report is a yearly bases. I mean, I load the data into qlik sense for one year).
I used the start and end to generate autoCalendar as a timestamp field in qlik sense data manager. Then, I get the month from start and store it in the calculated column "month" in the table1 using the feature of autoCalendar (Month(start.autoCalendar.Month)).
After that, I created another table having two columns (month, value2) the value2 column is a factor value which I need it to divide the value1 according to each month. that's mean (sum(value1) /1520 [for January], sum(value2) / 650 [for February]) and so on. Here the month and month columns are relational columns in qlik sense. then I could in my expression calculated the sum(value1) and get the targeted value2 which compatible with the month for the table2.
I could make the calculation correctly. but still one thing is missed. The data of the products does not have value (value1 ) in every month. For example, let's say that I have a products (p1,p2...). I have data in the table 1 for (Jun, Feb, Nov), and for p2 for (Mrz, Apr,Mai, Dec). Hence, When the data are presented in a qlik sense table as well as in a bar chart I can see only the months which have values in the fact table. The qlik sense table contains (2 dimensions which are [products] and [month] and the measure is m1[sum(value1)/value2]).
What I want to have a yearly report showing the 12 months. and in my example I can see for p1 (only 3 months) and for p2 (4 months). When there is no data the measure column [m1] 0 and I want to have the 0 in my table and chart.
I am think, it might be a solution if I can show the data of the the qlik sense table as right outer join of my relation relationship (table1.month>>table2.month).So, is it possible in qlik sense to have outer join in such an example? or there is a better solution to my problem.
Update
Got it. Not sure if that this is the best approach but in this cases I usually fill the missing records during the script load.
// Main table
Sales:
Load
*,
ProductId & '-' & Month as Key_Product_Month
;
Load * Inline [
ProductId, Month, SalesAmount
P1 , 1 , 10
P1 , 2 , 20
P1 , 3 , 30
P2 , 1 , 40
P2 , 2 , 50
];
// Get distinct products and assign 0 as SalesAmount
Products_Temp:
Load
distinct ProductId,
0 as SalesAmount
Resident
Sales
;
join (Products_Temp) // Cross join in this case
Load
distinct Month
Resident
Sales
;
// After the cross join Products_Temp table contains
// all possible combinations between ProductId and Month
// and for each combination SalesAmount = 0
Products_Temp_1:
Load
*,
ProductId & '-' & Month as Key_Product_Month1 // Generate the unique id
Resident
Products_Temp
;
Drop Table Products_Temp; // we dont need this anymore
Concatenate (Sales)
// Concatenate to main table only the missing ProductId-Month
// combinations that are missing
Load
*
Resident
Products_Temp_1
Where
Not Exists(Key_Product_Month, Key_Product_Month1)
;
Drop Table Products_Temp_1; // not needed any more
Drop Fields Key_Product_Month1, Key_Product_Month; // not needed any more
Before the script:
After the script:
The table link in Qlik Sense (and Qlikview) is more like full outer join. if you want to show the id only from one table (and not all) you can create additional field in the table you want and then perform your calculations on top of this field instead on the linked one. For example:
Table1:
Load
id,
value1
From
MyQVD1.qvd (qvd)
;
Table2:
Load
id,
id as MyRightId
value2
From
MyQVD2.qvd (qvd)
;
In the example above both tables will still be linked on id field but if you want to count only the id values in the right table (Table2) you just need to type
count( MyRightId )
I know this questions has been answered and I quite like Stefan's approach but hope my answer will help other users. I recently ran into something similar and I used a slightly different logic with the following script:
// Main table
Sales:
Load * Inline [
ProductId, Month, SalesAmount
P1 , 1 , 10
P1 , 2 , 20
P1 , 3 , 30
P2 , 1 , 40
P2 , 2 , 50
];
Cartesian:
//Create a combination of all ProductId and Month and then load the existing data into this table
NoConcatenate Load distinct ProductId Resident Sales;
Join
Load Distinct Month Resident Sales;
Join Load ProductId, Month, SalesAmount Resident Sales; //Existing data loaded
Drop Table Sales;
This results in the following output table:
The Null value in the new (bottom-most) row can stay like that but if you prefer replacing it then use Map..Using process
I have a pretty simple chart with a likely common issue. I've searched for several hours on the interweb but only get so far in finding a similar situation.
the basics of what I'm pulling contains a created_by, person_id and risk score
the risk score can be:
1 VERY LOW
2 LOW
3 MODERATE STABLE
4 MODERATE AT RISK
5 HIGH
6 VERY HIGH
I want to get a headcount of persons at each risk score and display a risk count even if there is a count of 0 for that risk score but SSRS 2005 likes to suppress zero counts.
I've tried this in the point labels
=IIF(IsNothing(count(Fields!person_id.value)),0,count(Fields!person_id.value))
Ex: I'm missing values for "1 LOW" as the creator does not have any "1 LOW" they've assigned risk scores for.
*here's a screenshot of what I get but I'd like to have a column even for a count when it still doesn't exist in the returned results.
#Nathan
Example scenario:
select professor.name, grades.score, student.person_id
from student
inner join grades on student.person_id = grades.person_id
inner join professor on student.professor_id = professor.professor_id
where
student.professor_id = #professor
Not all students are necessarily in the grades table.
I have a =Count(Fields!person_id.Value) for my data points & series is grouped on =Fields!score.Value
If there were a bunch of A,B,D grades but no C & F's how would I show labels for potentially non-existent counts
In your example, the problem is that no results are returned for grades that are not linked to any students. To solve this ideally there would be a table in your source system which listed all the possible values of "score" (e.g. A - F) and you would join this into your query such that at least one row was returned for each possible value.
If such a table doesn't exist and the possible score values are known and static, then you could manually create a list of them in your query. In the example below I create a subquery that returns a combination of all professors and all possible scores (A - F) and then LEFT join this to the grades and students tables (left join means that the professor/score rows will be returned even if no students have those scores in the "grades" table).
SELECT
professor.name
, professorgrades.score
, student.person_id
FROM
(
SELECT professor_id, score
FROM professor
CROSS JOIN
(
SELECT 'A' AS score
UNION
SELECT 'B'
UNION
SELECT 'C'
UNION
SELECT 'D'
UNION
SELECT 'E'
UNION
SELECT 'F'
) availablegrades
) professorgrades
INNER JOIN professor ON professorgrades.professor_id = professor.professor_id
LEFT JOIN grades ON professorgrades.score = grades.score
LEFT JOIN student ON grades.person_id = student.person_id AND
professorgrades.professor_id = student.professor_id
WHERE professorgrades.professor_id = 1
See a live example of how this works here: SQLFIDDLE
SELECT RS.RiskScoreId, RS.Description, SUM(DT.RiskCount) AS RiskCount
FROM (
SELECT RiskScoreId, 1 AS RiskCount
FROM People
UNION ALL
SELECT RiskScoreId, 0 AS RiskCount
FROM RiskScores
) DT
INNER JOIN RiskScores RS ON RS.RiskScoreId = DT.RiskScoreId
GROUP BY RS.RiskScoreId, RS.Description
ORDER BY RS.RiskScoreId