I want to create a balance column over (partition by add subtract) order by Date as below
ID Date Add Subtract Balance
a 2019/01/01 500 0 500
a 2019/01/02 0 300 200
b 2019/03/01 800 0 800
b 2019/03/10 300 0 1100
I saw the solution once, but can not find it again. According to my remember, we need coalesce, lag, lead.
Pls help or give me a link to the relate question.
With the data you have, you need only the sum() window function:
select *,
sum(add - subtract)
over (partition by id
order by date) as balance
from your_table
Related
UserID
CalMonth
ActiveFlag
Months_since_last_active
A
1/1/2021
1
0
A
2/1/2021
1
A
3/1/2021
2
A
4/1/2021
1
0
B
1/1/2021
1
0
B
2/1/2021
1
B
3/1/2021
1
0
Problem --> The first 3 colums are given. Generate the last one 'Months_since_last_active' by adding 1 until the use is active again
My Solution as below:
With active_sessions as (
Select
User_Id
, CalMonth
, active flag as current_flag
, LAG (ActiveFlag,1) over (partition by User_Id order by CalMonth) as previous_flag
)
Select User_Id, CalMonth, current_flag, sum(case when current_flag =1 then 0
when current_flag IS NULL then Months_since_last_active + 1
END
) as Months_since_last_active
from active_sessions
order by 1,2
I was asked the above question in an interview and told that my proposed solution would not work because:
When it comes to 3/1/2021 and beyond, the previous values of 'Months_since_last_active' are not in the table yet -- they are only in the code
If I wanted to use LAG function, then it'd take innumerable LAG functions to achieve what I was trying to achieve
I will appreciate if someone can comment on my solution.
Your solution has 3 major problems, 2 of them may be related to copy/past errors. The active_sessions CTE is missing the from clause, so there is no data source. Then the main portion uses the aggregate function SUM, however, the query has no group by which is required for the aggregate function. These are easily corrected. The other issue concerns the LAG function and your use of it.
First off in the CTE you alias the result as previous_flag, then in the main query you reference Months_since_last_active which does not exist yet. I think this is the source of the interviewer's first point.
The interviewer's second point also stems form the LAG function. As written it always looks back exactly 1 row, but from the current row yet it needs to look back 2 rows for (userid, calmonth) = ('A', 2021-03-01), and 3 rows for (A, 2021-04-01), etc. Basically you need to look back to to the last row with active_flag = 1. This leads directly to the it'd take innumerable LAG functions as you do not know how far beck you need to look. Suppose you had 30-40 or more inactive rows between active rows. You need a LAG(activeflag,n) ... for each possibility.
A solution. I dislike the problem statement it should not contain by adding 1 until the use is active again (is it yours or theirs). Either way this is an XY. If theirs they should be telling you what to solve, i.e. find number of months since last active. If yours you have created the problem for yourself. The problem statement should not say anything about how to solve the it. I will ignore that portion of the problem (And in a real interview I would/have ignored it, but be prepared to explain why).
What you have a a version of a Gaps And Islands (google it, you will find more that to think about). In this version lets consider each row with activeflag = 'Y' an as island, and anything else as a gap. Nor what you are looking for is the length of the gaps between islands. In the following the island_num CTE does 2 things. It assigns a sequence number to each row for a (userid, calmonth) and generates a boolean for each island. The `gap_points' then joins the results with itself, selecting the assigned for the max island whose calmonth is less than the current rows calmonth. In the main part the Months_since_last_active is assigned 0 if the current row is an island, and the difference between the generated row numbers if it is a gap. (see demo)
with island_num (userid, cal_month, active_flag, is_island, row_num) as
( select am.*
, case when am.activeflag = 1 then true else false end is_island
, row_number() over (partition by am.userid order by am.calmonth) rn
from active_month am
) -- select * from island_num
, gap_points(userid, cal_month, active_flag, is_island, row_num, island_row) as
( select *
from island_num i1
join lateral
(select max(row_num)
from island_num i2
where i1.userid = i2.userid
and i2.cal_month < i1.cal_month
and i2.is_island
) s0
on true
) --select * from gap_points;
select userid "User Id"
, cal_month "Cal Month"
, active_flag "Active Flag"
, case when is_island then 0
else row_num - island_row
end "Months_since_last_active"
from gap_points;
I have a table of real state properties and I want to create a table that shows me the count of the properties that fall in certain price range by zones, something like this:
Zone 0-149k 150-300k
North 25 150
South 150 350
For example for the first result my query would be:
SELECT COUNT(*) FROM MY TABLE
WHERE ZONE = 'North' AND PRICE < 150000
and similar for the other fields
But I'm unable to find a unified query that shows me the data in the desired way. I've tried with the UNION command but this shows me all the data as continuous rows. Any thoughts?
You can use a FILTER on an aggregate:
SELECT
zone,
COUNT(*) FILTER (WHERE price < 150000) AS "0-149k",
COUNT(*) FILTER (WHERE 150000 <= price AND price < 300000) AS "150-300k"
FROM my_table
GROUP BY zone;
(If you have an unknown number of price ranges, see this approach).
First off I'm a total SQL noob - Thanks in advance for any assistance you can offer.
I have a FortiAnalyzer that uses a Postgres DB to store firewall logs. The Analyzer is then used to report on usage etc.
Basically I need to write a custom query that can show the Top 10 Users by bandwidth used for the top 10 Websites/destinations per user.
I can get all of the relevant information out of the unit, but I cannot get the output formatted correctly.
I would be happy with the output showing a username 10 times with the top 10 sites next to the username. First prize however would be to show the username in Column A only once, then in column B and C the destination address and bandwidth used respectively.
Here is the query I have so far:
select coalesce(nullifna(`user`), `src`) as user_src,
coalesce(hostname, dstname, 'unknown') as web_site,
sum(rcvd + sent)/1024 as bandwidth from $log
where $filter and user is not null and status in ('passthrough', 'filtered')
group by `user_src` , web_site order by user_src desc
Once the query is linked to a report chart, I them have options to limit output by x value. I could for example limit this to limit the user_src column to 100 (i.e 10 Users with 10 outputs each)
I hope this is clear to you... If not, I will do my best to answer any questions.
I start with table aggregated on website, user_src level. Than it is not difficult to get top X users for top Y sites. You will need to use window function to get desired result.
Sample data:
create table test (web_site varchar, user_src varchar, bandwidth numeric);
insert into test values
('a','s1',18),
('b','s1',12),
('c','s1',13),
('d','s2',14),
('e','s2',15),
('f','s2',16),
('g','s3',17),
('h','s3',18),
('i','s3',19)
;
Get top X websites for top Y users:
with cte as (
select
user_src,
web_site,
bandwidth,
dense_rank() over(order by site_bandwidth desc) as user_rank,
dense_rank() over(partition by user_src order by bandwidth desc) as website_rank
from
test
join (select user_src, sum(bandwidth) site_bandwidth from test group by user_src) a using (user_src)
)
select
*
from
cte
where
user_rank <= 2
and website_rank <=2
order by
user_rank,
website_rank
SQLFiddle
How to get row number in PostgreSQL when the results are ordered by some column?
e.g.
SELECT 30+row_number() AS position, *
FROM users
ORDER BY salary DESC
LIMIT 30
OFFSET 30
I supposed that the query would return list like this:
position | name | salary
31 | Joy | 4500
32 | Katie| 4000
33 | Frank| 3500
Actually i have to duplicate the ORDER clause into the query to make it functional:
SELECT 30+row_number(ORDER BY salary DESC) AS position, *
FROM users
ORDER BY salary DESC
LIMIT 30
OFFSET 30
Is there any other way how to return ordered and numbered results without necessity of duplicating the code?
I know this can be solved by incrementing some variable in the app itself, but i wanna do this at the database layer and return to the app already numbered results...
no - the order by in the windowing function and the order by clause of the select statement are functionally two different things.
Also, your statement produces: ERROR: window function call requires an OVER clause, so:
SELECT 30+row_number(ORDER BY salary DESC) AS position, * FROM users ORDER BY salary DESC LIMIT 30 OFFSET 30
should be:
SELECT 30+row_number() OVER(ORDER BY salary DESC) AS position, * FROM users ORDER BY salary DESC LIMIT 30 OFFSET 30
Note that if salaries are not unique then there is no guarantee that they will even produce the same order. Perhaps it would be better to do:
SELECT *
FROM ( SELECT 30+row_number() OVER(ORDER BY salary DESC) AS position, *
FROM users )
ORDER BY position LIMIT 30 OFFSET 30
Also note that if you are running this query several times with different offsets, you need to:
set your isolation level to serializable
make sure that whatever you are ordering by is unique
or you may get duplicates and missing rows. See the comments on this answer for why.
Hey guys,
So I have this report that I am grouping into different age buckets. I want the count of an age bucket to be zero if there are no rows associated with this age bucket. So I did an outer join in my database select and that works fine. However, I need to add a group based on another column in my database.
When I add this group, the agebuckets that had no rows associated with them dissapear. I thought it might have been because the column that I was trying to group by was null for that row, so I added a row number to my select, and then grouped by that (I basically just need to group by each row and I can't just put it in the details... I can explain more about this if necessary). But after adding the row number the agebuckets that have no data are still null! When I remove this group that I added I get all age buckets.
Any ideas? Thanks!!
It's because the outer join to age group is not also an outer join to whatever your other
group is - you are only guaranteed to have one of each age group per data set, not one of each age group per [other group].
So if, for example, your other group is Region, you need a Cartesian / Cross join from your age range table to a Region table (so that you get every possible combination of age range and region), before outer joining to the rest of your dataset.
EDIT - based on the comments, a query like the following should work:
select date_helper.date_description, c.case_number, e.event_number
from
(select 0 range_start, 11 range_end, '0-10 days' date_description from dual union
select 11, 21, '11-20 days' from dual union
select 21, 31, '21-30 days' from dual union
select 31, 99999, '31+ days' from dual) date_helper
cross join case_table c
left outer join event_table e
on e.event_date <= date_helper.range_start*-1 + sysdate
and e.event_date > date_helper.range_end*-1 + sysdate
and c.case_number = e.case_number
(assuming that it's the event_date that needs to be grouped into buckets.)
I had trouble understanding your question.
I do know that Crystal Reports' NULL support is lacking in some pretty fundamental ways. So I usually try not to depend on it.
One way to approach this problem is to hard-code age ranges in the database query, e.g.:
SELECT p.person_type
, SUM(CASE WHEN
p.age <= 2
THEN 1 ELSE 0 END) AS "0-2"
, SUM(CASE WHEN
p.age BETWEEN 2 AND 18
THEN 1 ELSE 0 END) AS "3-17"
, SUM(CASE WHEN
p.age >= 18
THEN 1 ELSE 0 END) AS "18_and_over"
FROM person p
GROUP BY p.person_type
This way you are sure to get zeros where you want zeros.
I realize that this is not a direct answer to your question. Best of luck.