The table that I querying is :
table testing_table:
testType | period_from | period_to| copies |
1 | 20180101| 20181201| 1|
2 | 20180101 | 20191201| 1|
3 | 20190101| 20191201| 1|
I want to loop through the array and use the below query to generate values like this:
DateVar | ABTEST | CDTEST | EFTEST |
20180101| 4| 0| 0|
20180201| 3| 4| 2|
dateVar = ['20180101','20180201','20180501'].
I am trying to develop an sql query like this:
SELECT
SUM (
CASE
WHEN (testType = 1 AND (period_from <= dateVar AND period_to >= dateVar)) THEN
copies
ELSE
0
END
) AS "ABTEST",
SUM (
CASE
WHEN (testType = 2 AND (period_from <= dateVar AND period_to >= dateVar)) THEN
copies
ELSE
0
END
) AS "CDTEST",
SUM (
CASE
WHEN (testType = 3 AND (period_from <= dateVar AND period_to >= dateVar)) THEN
copies
ELSE
0
END
) AS "EFTEST"
FROM
testing_table;
I am lost as to what to do with it. Do I look into functions?
I think you should use unnest function to accomplish what you are asking i have written a query you may want to check
SELECT DTVAR,
SUM(CASE
WHEN TestType = 1
THEN copies
ELSE 0
END) AS 'ABTEST',
SUM(CASE
WHEN TestType = 2
THEN copies
ELSE 0
END) AS 'CDTEST',
SUM(CASE
WHEN TestType = 3
THEN copies
ELSE 0
END) AS 'EFTEST'
FROM (
SELECT DTVAR, TestType, sum(copies) AS copies
FROM testing_table
INNER JOIN (
SELECT DTVAR
FROM unnest(dateVar ['20180101','20180201','20180501']) AS DTVAR
) AA
ON (
period_from <= DTVAR
AND period_to >= DTVAR
)
GROUP BY DTVAR, TestType
) A
GROUP BY DTVAR
hope this helps..
Related
For the following tables:
-- People
id | category | count
----+----------+-------
1 | a | 2
1 | a | 3
1 | b | 2
2 | a | 2
2 | b | 3
3 | a | 1
3 | a | 2
I know that I can find the max count for each id in each category by doing:
SELECT id, category, max(count) from People group by category, id;
With result:
id | category | max
----+----------+-------
1 | a | 3
1 | b | 2
2 | a | 2
2 | b | 3
3 | a | 2
But what if now I want to label the max values differently, like:
id | max_b_count | max_a_count
----+-------------+------------
1 | 2 | 3
2 | 3 | 2
3 | Null | 2
Should I do something like the following?
WITH t AS (SELECT id, category, max(count) from People group by category, id)
SELECT t.id, t.count as max_a_count from t where t.category = 'a'
FULL OUTER JOIN t.id, t.count as max_b_count from t where t.category = 'b'
on t.id;
It looks weird to me.
This is the exact use case why the filter_clause was added to the Aggregate Expressions
With filter_clause you may limit which row you aggregate
aggregate_name ( * ) [ FILTER ( WHERE filter_clause ) ]
Your example
SELECT id,
max(count) filter (where category = 'a') as max_a_count,
max(count) filter (where category = 'b') as max_b_count
from People
group by id
order by 1;
id|max_a_count|max_b_count|
--+-----------+-----------+
1| 3| 2|
2| 2| 3|
3| 2| |
This is one way you can do it:
with T as (select id, category, max(count_ab) maks
from people
group by id, category
order by id)
select t3.id
, (select t1.maks from T t1 where category = 'b' and t1.id = t3.id) max_b_count
, (select t2.maks from T t2 where category = 'a' and t2.id = t3.id) max_a_count
from T t3
group by t3.id
order by t3.id
Here is a demo
Also, as you can see, I have changed the name of the column count to count_ab because it is not a good practice to use keywords as columns names.
I am calculating the last 12-months count after joining multiple tables, my expected output is working is OK but it is not what I want?. I want to add another column with the name "Current Month", so the basic idea is if I see a report for the month May, then it will start from Last year's May till This year's April and May as Current Month, total 13 columns counts. My intuition says window query will help me out on this, but I am now sure how I can do that.
select
c.name,
SUM(case when RTRIM(TO_CHAR(mor.sent_at , 'Month')) = 'January' THEN 1 END) as "January",
SUM(case when RTRIM(TO_CHAR(mor.sent_at , 'Month')) = 'February' THEN 1 END) as "February",
SUM(case when RTRIM(TO_CHAR(mor.sent_at , 'Month')) = 'March' THEN 1 END) as "March",
SUM(case when RTRIM(TO_CHAR(mor.sent_at , 'Month')) = 'April' THEN 1 END) as "April",
SUM(case when RTRIM(TO_CHAR(mor.sent_at , 'Month')) = 'May' THEN 1 END) as "May",
SUM(case when RTRIM(TO_CHAR(mor.sent_at , 'Month')) = 'June' THEN 1 END) as "June",
SUM(case when RTRIM(TO_CHAR(mor.sent_at , 'Month')) = 'July' THEN 1 END) as "July",
SUM(case when RTRIM(TO_CHAR(mor.sent_at , 'Month')) = 'August' THEN 1 END) as "August",
SUM(case when RTRIM(TO_CHAR(mor.sent_at , 'Month')) = 'September' THEN 1 END) as "September",
SUM(case when RTRIM(TO_CHAR(mor.sent_at , 'Month')) = 'October' THEN 1 END) as "October",
SUM(case when RTRIM(TO_CHAR(mor.sent_at , 'Month')) = 'November' THEN 1 END) as "November",
SUM(case when RTRIM(TO_CHAR(mor.sent_at , 'Month')) = 'December' THEN 1 END) as "December"
from analytics_outbox mo
inner join analytics_outbox_recipient mor on mor.analytics_outbox_id = mo.id
inner join customer c on c.id = mo.customer_id
group by c.name
Current Output:
name |january|february|march |april |may |june|july|august|september|october|november|december|
----------------------------------+-------+--------+------+-------+-------+----+----+------+---------+-------+--------+--------+
ABC | | | 1| 2| | | | | | | | |
DEF | 11| 24| 34| 32| 19| | | | | | | |
GEH | 9| 3| 7| 18| 22| | | | | | | |
IJK | | | | 1| | | | | | | | |
Dynamic result column names are only possible with dynamic SQL.
This should do the job efficiently, save the dynamic column names:
SELECT c.name
, to_char(t.mon, 'Month YYYY') AS report_month
, count(*) FILTER (WHERE mor.sent_at >= t.mon - interval '12 mon' AND mor.sent_at < t.mon - interval '11 mon') AS mon1
, count(*) FILTER (WHERE mor.sent_at >= t.mon - interval '11 mon' AND mor.sent_at < t.mon - interval '10 mon') AS mon2
, count(*) FILTER (WHERE mor.sent_at >= t.mon - interval '10 mon' AND mor.sent_at < t.mon - interval '09 mon') AS mon3
-- etc.
FROM analytics_outbox mo
JOIN analytics_outbox_recipient mor ON mor.analytics_outbox_id = mo.id
JOIN customer c ON c.id = mo.customer_id
, (SELECT date_trunc('month', now())) AS t(mon) -- add once for ease of use
GROUP BY 1;
This compares unaltered values from sent_at to a constant value (computed once), which is cheaper than running each value through multiple functions before comparison.
Possible corner case issues with time zone and timestamp vs. timestamptz unresolved due to missing input.
I have a dataframe
Hi,I have a dataframe as below
+-------+--------+
|id |level |
+-------+--------+
| 0 | 0 |
| 1 | 0 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 0 |
| 6 | 1 |
| 7 | 1 |
| 8 | 0 |
| 9 | 1 |
| 10 | 0 |
+-------+--------+
and I need the sum of consecutive 1's .SO the output should be 3,2,1.However the constraint in this scenario is that i do not need to use UDF Is there any in-built scala/spark function that can do this trick.I am not able to USE UDF
You could use row_number and count (SQL/Dataframe API), to count the number of consecutive values (repeat) in a column.
The trick is to count the offset between the current row and the index of the occurrence of the consecutive targeted values.
Scala
var df = spark.createDataFrame(Seq((0,0),(1,0),(2,1),(3,1),(4,1),(5,0),(6,1),(7,1),(8,0),(9,1),(10,0))).toDF("id","level")
df.createOrReplaceTempView("DT")
var df_cnt = spark.sql("select level, count(*) from (select *, (row_number() over (order by id) - row_number() over (partition by level order by id) ) as grp from DT order by id) as t where level !=0 group by grp, level ")
df_cnt.show()
The sequence of id must be maintained otherwise it will produce the wrong result.
Pyspark
df = spark.createDataFrame([(0,0),(1,0),(2,1),(3,1),(4,1),(5,0),(6,1),(7,1),(8,0),(9,1),(10,0)]).toDF("id","level")
df.createOrReplaceTempView('DF')
//same as before with spark.sql(...)
SQL
select level, count(*) from
(select *,
(row_number() over (order by id) -
row_number() over (partition by level order by id)
) as grp
from SDF order by id) as t
where level !=0
group by grp, level
Intermediate sql computation detail (row offset, and grouping) :
You could do something like this:
val seq = Seq(0,0,1,1,1,0,1,1,0,1,0)
val seq1s = seq.foldLeft("")(_ + _).split("0")
seq1s.map(_.sliding(1).count(_ == "1"))
res: Array[Int] = Array(0, 0, 3, 2, 1)
If you donĀ“t want the 0s there you could just filter them out using this instead:
seq1s.map(_.sliding(1).count(_ == "1")).filterNot(_ == 0)
res: Array[Int] = Array(3, 2, 1)
I'm using Amazon RDS (Aurora) so don't have access to the crosstab() function.
My dataset is a count of particular actions per user and looks like:
| uid | action1 | action2 |
| alice | 2 | 2 |
| bob | 1 | 2 |
| charlie | 5 | 0 |
How can I pivot this dataset to make a histogram of action counts? So it would look like:
# | Action1 | Action2
---------------------
0 | | 1
1 | 1 |
2 | 1 | 2
3 | |
4 | |
5 | 1 |
6 | |
Here's a SQL fiddle I've been using with the values already entered: http://sqlfiddle.com/#!17/2b966/1
I have a solution but it is very verbose:
WITH nums AS (
SELECT n
FROM (VALUES (0), (1), (2), (3), (4), (5)) nums(n)
),
action1_counts as (
select
action1,
count(*) as total
from test
group by 1
),
action2_counts as (
select
action2,
count(*) as total
from test
group by 1
)
select
nums.n,
coalesce(a1.total, 0) as Action1,
coalesce(a2.total, 0) as Action2
from nums
LEFT join action1_counts a1 on a1.action1 = nums.n
LEFT join action2_counts a2 on a2.action2 = nums.n
order by 1
Assume action is between 0 and 6.
select a1.action, a1.action1, nullif(count(t2.action2),0) as action2
from
( select t.action, nullif(count(t1.action1),0) as action1
from
(select action from generate_series(0,6) g(action)) t
left join
test t1
on t1.action1 = t.action
group by t.action
) a1
left join
test t2
on t2.action2 = a1.action
group by a1.action, a1.action1
order by a1.action
How can I simulate a XOR function in PostgreSQL? Or, at least, I think this is a XOR-kind-of situation.
Lets say the data is as follows:
id | col1 | col2 | col3
---+------+------+------
1 | 1 | | 4
2 | | 5 | 4
3 | | 8 |
4 | 12 | 5 | 4
5 | | | 4
6 | 1 | |
7 | | 12 |
And I want to return 1 column for those rows where only one of the columns is filled in. (ignore col3 for now..
Lets start with this example of 2 columns:
SELECT
id, COALESCE(col1, col2) AS col
FROM
my_table
WHERE
COALESCE(col1, col2) IS NOT NULL -- at least 1 is filled in
AND
(col1 IS NULL OR col2 IS NULL) -- at least 1 is empty
;
This works nicely an should result in:
id | col
---+----
1 | 1
3 | 8
6 | 1
7 | 12
But now, I would like to include col3 in a similar way. Like this:
id | col
---+----
1 | 1
3 | 8
5 | 4
6 | 1
7 | 12
How can this be done is a more generic way? Does Postgres support such a method?
I'm not able to find anything like it.
rows with exactly 1 column filled in:
select * from my_table where
(col1 is not null)::integer
+(col1 is not null)::integer
+(col1 is not null)::integer
=1
rows with 1 or 2
select * from my_table where
(col1 is not null)::integer
+(col1 is not null)::integer
+(col1 is not null)::integer
between 1 and 2
The "case" statement might be your friend here, the "min" aggregated function doesn't affect the result.
select id, min(coalesce(col1,col2,col3))
from my_table
group by 1
having sum(case when col1 is null then 0 else 1 end+
case when col2 is null then 0 else 1 end+
case when col3 is null then 0 else 1 end)=1
[Edit]
Well, i found a better answer without using aggregated functions, it's still based on the use of "case" but i think is more simple.
select id, coalesce(col1,col2,col3)
from my_table
where (case when col1 is null then 0 else 1 end+
case when col2 is null then 0 else 1 end+
case when col3 is null then 0 else 1 end)=1
How about
select coalesce(col1, col2, col3)
from my_table
where array_length(array_remove(array[col1, col2, col3], null), 1) = 1