I want to write a query as per below input and output
Input :-
Num Sr_no Exp_no
NULL 1 1
NULL 2 1
ABC_1 3 1
NULL 4 1
NULL 1 2
NULL 2 2
ABC_2 3 2
NULL 4 4
Expected Output:-
Num Sr_no Exp_no
ABC_1 1 1
ABC_1 2 1
ABC_1 3 1
ABC_1 4 1
ABC_2 1 2
ABC_2 2 2
ABC_2 3 2
ABC_2 4 4
As there is no details in question, this answer is on below assumptions
you want to fill num field based on exp_no grouping.
Assuming there is only one value in a exp_no group.
Try this:
with cte as
(
select distinct on (num,exp_no) num, exp_no
from test
where num is not null
order by 1)
select
coalesce(t1.num, cte.num),
t1.sr_no,
t1.exp_no
from test t1 left join cte on t1.exp_no=cte.exp_no
DEMO
Related
Given the following table
time kind counter key1 value
----------------------------------------
1 1 1 1 1
2 0 1 1 2
3 0 1 2 3
5 0 1 1 4
5 1 2 2 5
6 0 2 3 6
7 0 2 2 7
8 1 3 3 8
9 1 4 3 9
How would one select the value in the first row
immediately after and immediately before each
row of kind 1 ordered by time where the key1
value is the same in both instances .i.e:
time value prevvalue nextvalue
---------------------------------------------
1 1 0n 2
5 5 3 7
8 8 6 0n
9 9 6 0n
Here are some of the things I have tried, though
to be honest I have no idea how to canonically achieve
something like this in q whereby the prior value has a
variable offset to the current row?
select
prev[value],
next[value],
by key1 where kind<>1
update 0N^prevval,0N^nextval from update prevval:prev value1,nextval:next value1 by key1 from table
Some advice or a pointer on how to achieve this would be great!
Thanks
I was able to use the following code to return a table meeting your requirements. If this is correct, the sample table you have provided is incorrect, otherwise I have misunderstood the question.
q)table:([] time:1 2 3 5 5 6 7 8 9;kind:1 0 0 0 1 0 0 1 1;counter:1 1 1 1 2 2 2 3 4;key1:1 1 2 1 2 3 2 3 3;value1:1 2 3 4 5 6 7 8 9)
q)tab2:update 0N^prevval,0N^nextval from update prevval:prev value1,nextval:next value1 by key1 from table
q)tab3:select from tab2 where kind=1
time value1 prevval nextval
---------------------------
1 1 2
5 5 3 7
8 8 6 9
9 9 8
The update statement in tab2:
update 0N^prevval,0N^nextval from update prevval:prev value1,nextval:next value1 by key1 from table
is simply adding 2 columns onto the original table with the previous and next values for each row. 0^ is filling the empty fields with nulls.
The select statement in tab3:
tab3:select from tab2 where kind=1
is filtering tab2 for rows where kind=1.
The final select statement:
select time,value1,prevval,nextval from tab3
is selecting the rows you want to be returned in the final result.
Hope this answers your question.
Thanks,
Caitlin
I have a table like this:
type code desc store Sales/Day Stock
-----------------------------------------------
1 AA1 abc 101 3 6
1 AA2 abd 101 4 0
1 AA3 abf 101 4 3
2 BA1 bba 101 5 1
2 BA2 bbc 101 2 1
1 AA1 abc 102 1 4
1 AA2 abd 102 2 0
2 BA1 bba 102 4 2
2 BA2 bbc 102 5 5
etc.
How I can show the result table like this:
type code desc Store 101 Store 102
Sales/Day | Stock Sales/Day | Stock
--------------------------------------------------------------
1 AA1 abc 3 6 1 4
1 AA2 abd 4 0 2 0
1 AA3 abf 4 3 0 0
2 BA1 bba 5 1 4 2
2 BA2 bbc 2 1 5 5
etc.
Note:
Colspan is only display.
demo:db<>fiddle
First way: FILTER
SELECT
type,
code,
"desc",
COALESCE(SUM(sales_day) FILTER (WHERE store = 101)) as sales_day_101,
COALESCE(SUM(stock) FILTER (WHERE store = 101), 0) as stock_101,
COALESCE(SUM(sales_day) FILTER (WHERE store = 102), 0) as sales_day_102,
COALESCE(SUM(stock) FILTER (WHERE store = 102), 0) as stock_102
FROM mytable
GROUP BY type, code, "desc"
ORDER BY type, code
Aggregating your values. I took SUM but in your case with distinct rows many other aggregate functions would do it. FILTER allows you to aggregate only one store.
The COALESCE is to avoid NULL values if no values are present for one aggregation (like AA3 in store 102).
Second way, CASE WHEN
SELECT
type,
code,
"desc",
SUM(CASE WHEN store = 101 THEN sales_day ELSE 0 END) as sales_day_101,
SUM(CASE WHEN store = 101 THEN stock ELSE 0 END) as stock_101,
SUM(CASE WHEN store = 102 THEN sales_day ELSE 0 END) as sales_day_102,
SUM(CASE WHEN store = 102 THEN stock ELSE 0 END) as stock_102
FROM mytable
GROUP BY type, code, "desc"
ORDER BY type, code
The idea is the same, but the newer FILTER function is replace by the more common CASE clause.
Notice that "desc" is a reserved word in Postgres. So I strictly recommend to rename your column.
I have my data that looks like this:
user_id touchpoint_number days_difference
1 1 5
1 2 20
1 3 25
1 4 10
2 1 2
2 2 30
2 3 4
I would like to create one more column that would create a cumulative sum of the days_difference, partitioned by user_id, but would reset whenever the value reaches 30 and starts counting from 0. I have been trying to do it, but I couldn't figure it out how to do it in PostgreSQL, because it has to be recursive.
The outcome I would like to have would be something like:
user_id touchpoint_number days_difference cum_sum_upto30
1 1 5 5
1 2 20 25
1 3 25 0 --- new count all over again
1 4 10 10
2 1 2 2
2 2 30 0 --- new count all over again
2 3 4 4
Do you have any cool ideas how this could be done?
This should do what you want:
with cte as (
select t.a, t.b, t.c, t.c as sumc
from t
where b = 1
union all
select t.a, t.b, t.c,
(case when t.c + cte.sumc > 30 then 0 else t.c + cte.sumc end)
from t join
cte
on t.b = cte.b + 1 and t.a = cte.a
)
select *
from cte
order by a, b;
Here is a rextester.
I am actually new to pyspark and i am trying to do some data manipulations with it.
I have a DataFrame like below example:
Trxn Cust_ID Group
3370 A 1
8809 C 2
3525 B 3
8260 A 3
6349 B 3
3359 C 3
3701 NULL 3
5572 NULL 2
2580 A 1
In this DF, Trxn's are unique and the cust_id's can be repetitive and every cust_id belongs to some group. I need a Final Dataframe with the new group column names like array(Group_1, Group_2.. so on) where I do have a count of cust_id's belong to each group. Below is the output example:
Trxn Cust_ID Group Group_1 Group_2 Group_3
3370 A 1 2 0 1
8809 C 2 0 1 1
3525 B 3 0 0 2
8260 A 3 2 0 1
6349 B 3 0 0 2
3359 C 3 0 1 1
3701 NULL 3 0 1 1
5572 NULL 2 0 1 1
2580 A 1 2 0 1
Can someone let me know how to get this exact output in pyspark? Any help or hints would be greatly appreciated.
Seems like you are trying to do pivot here.
https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html
I would like a query that will show a sum of columns with a default value for missing data. For example assume I have a table as follows:
type_lookup:
id name
-----------
1 self
2 manager
3 peer
And a table as follows
data:
id type_lookup_id value
--------------------------
1 1 1
2 1 4
3 2 9
4 2 1
5 2 9
6 1 5
7 2 6
8 1 2
9 1 1
After running a query I would like a result set as follows:
type_lookup_id value
----------------------
1 13
2 25
3 0
I would like all rows in type_lookup table to be included in the result set - even if they don't appear in the data table.
It's a bit hard to read your data layout, but something like the following should do the trick:
SELECT tl.type_lookup_id, tl.name, sum(da.type_lookup_id) how_much
from type_lookup tl
left outer join data da
on da.type_lookup_id = tl.type_lookup_id
group by tl.type_lookup_id, tl.name
order by tl.type_lookup_id
[EDIT]
...subsequently edited by changing count() to sum().