Postgresql Use a specified value instead of first_value - postgresql

I am trying to replace first_value and use a specified value to use in an equation because of alpha sort I'm having an issue. I want to use a value in a row called 'control' which is in the segment column and direct_mail_test table. So I need to find a way to call just that value ('control') to then use in the equation. New to PostgreSQL, any help would be greatly appreciated.
Here is my current code
select segment,
count(*) as total,
count(b.c_guid) as bookings,
100.0 * count(b.c_guid)/count(*) as percent,
100.0 * count(b.c_guid) / first_value(count(b.c_guid)) over ( order by segment asc ) as comp
from mailing_tests
left join (
select distinct g.contact_guid as c_guid
from guest g
inner join booking
on booking.guid = g.booking_guid
where booking.book_date >= {{date_start}}
[[and booking.book_date < {{date_end}}]]
and booking.status in ('Booked')
) b
on mailing_tests.guid = b.c_guid
where {{project}}
group by segment
order by segment asc
Here is my output:
segment
total bookings percent comp
catalog 4,091 30 0.73 100
control 30,611 359 1.17 1,196.67
direct_mail 30,611 393 1.28 1,310
online_ads 30,611 371 1.21 1,236.67
'''
As of now its taking in the 'catalog' as the
measurable and I need it to take the 'control' as the
measurable.
Just to put some more context on the code, I am using
metabase. So the {{date_start}}
[[and peak15_booking.book_date < {{date_end}}]]
{{project}}
are all variable functions used in metabase.
I tried to use the nth_value, fetch function, and
many others, but I'm not sure if I am even using
those properly. I have been unsuccessful in finding
any answer to this.

I found a way to make this work, but not sure if it's the cleanest way to do this or if I will run into future issues. I still haven't figured out how to replace first_value, but I did add (order by segment='control' desc) twice. once in the SELECT statement and the other in WHERE clause. Again, not sure if this is the correct way to do this, but thought I would show that I did end up figuring it out. Not sure if I should of answered my own question or deleted it, but thought that this might help someone else.
'
select segment,
count(*) as total,
count(b.c_guid) as bookings,
100.0 * count(b.c_guid) / count(*) as percent,
100.0 * count(b.c_guid) / first_value(count(b.c_guid)) over ( order by segment='control' desc, segment asc ) as comp
from mailing_tests
left join (
select distinct g.contact_guid as c_guid
from guest g
inner join booking
on booking.guid = g.booking_guid
where booking.book_date >= {{date_start}}
[[and booking.book_date < {{date_end}}]]
and booking.status in ('Booked')
) b
on mailing_tests.guid = b.c_guid
where {{project}}
group by segment
order by segment='control' desc, segment asc
'

Related

Finding the timeslot with the maximum decrease in count of nearby points

For each entry in the loc_of_interest table, I want to find the 15 minute timeslot (from the data in the other cte) with the maximum decrease in count of nearby points. I do not know how to proceed beyond the 'pseudocode' part, and indeed, am uncertain if I am going in the right direction with this existing code as well.
Here is my code:
-- I have two cte's already made
subset_cr -- many rows of data
(device_id, points_geom, time_created)
loc_of_interest -- 2 rows of data
(loc_id, points_geom)
-- here is how I wish to proceed:
with temp as (
SELECT loi.loc_id AS loc_id,
routes.fifteen_min_slot ,
routes.count_of_near_points
FROM loc_of_interest as loi
CROSS JOIN LATERAL (
SELECT date_trunc('hour', routes.time_created) + date_part('minute', routes.time_created)::int / 15 * interval '15 min' as fifteen_min_slot,
count (ST_DWithin(
loi.point_geom::geography,
st_transform(route_points.point_geom,4326)::geography,
100)) as count_of_near_points
FROM subset_cr as routes
) routes
group by 1,2
)
--pseudocode below
for each loc_id
select fifteen_min_slot
from temp
where difference in count_of_near_points is max
Code update:
I have added the following code for the pseudocode I wrote earlier:
tempy as (
select loc_id, fifteen_min_slot, count_of_near_points - lag (count_of_near_points) over (partition by loc_id, order by fifteen_min_slot) as lagging_diff
from temp
)
select loc_id, fifteen_min_slot
from tempy
where lagging_diff = (select max lagging_diff from tempy)

postgreSQL, first date when cummulative sum reaches mark

I have the following sample table
And the output should be the first date (for each id) when cum_rev reaches the 100 mark.
I tried the following, because I taught with group bz trick and the where condition i will only get the first occurrence of value higher than 100.
SELECT id
,pd
,cum_rev
FROM (
SELECT id
,pd
,rev
,SUM(rev) OVER (
PARTITION BY id
ORDER BY pd
) AS cum_rev
FROM tab1
)
WHERE cum_rev >= 100
GROUP BY id
But it is not working, and I get the following error. And also when I add an alias is not helping
ERROR: subquery in FROM must have an alias LINE 4: FROM (
^ HINT: For example, FROM (SELECT ...) [AS] foo.
So the desired output is:
2 2015-04-02 135.70
3 2015-07-03 102.36
Do I need another approach? Can anyone help?
Thanks
demo:db<>fiddle
SELECT
id, total
FROM (
SELECT
*,
SUM(rev) OVER (PARTITION BY id ORDER BY pd) - rev as prev_total,
SUM(rev) OVER (PARTITION BY id ORDER BY pd) as total
FROM tab1
) s
WHERE total >= 100 AND prev_total < 100
You can use the cumulative SUM() window function for each id group (partition). To find the first which goes over a threshold you need to check the previous value for being under the threshold while the current one meets it.
PS: You got the error because your subquery is missing an alias. In my example its just s

Order by two columns with different priority - Postgres

I want to be able to order users by two columns:
Number of followers they have
Number of the same following users that I am following - for similarity
Here's my query for now
SELECT COUNT(fSimilar.id) as similar_follow, COUNT(fCount.id) as followers_count, users.name FROM users
LEFT JOIN follows fSimilar ON fSimilar.user_id = users.id
AND fSimilar.following_id IN (
SELECT following_id FROM follows WHERE user_id = 1 // 1 is my user id
)
LEFT JOIN follows fCount ON fCount.following_id = users.id
WHERE users.name LIKE 'test%'
GROUP BY users.name
ORDER BY followers_count * 0.3 + similar_follow * 0.7 DESC
This selects people that follow the same people as me and also considers their popularity (amount of followers). This is similar to Instagram search.
I prioritise similar_follow by 70% or 0.7, and followers_count by 30%. However followers_count * 0.3 doesn't provide ordering integrity. For example some users have 1 - 10 million followers, this causes followers_count to be too large and similar_follow becomes too small to have any impact on ordering.
I have considered doing followers_count/500 where 500 is the average number of followers. However this still doesn't play well for ordering.
I need a way to equalise followers_count and similar_follow, so multiplication by percentages (0.3 and 0.7) makes a difference for both values.
I also looked at https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9#.wuz8j0f4w which describes Wilson score interval but I am not sure if this is the right solution in my case, as I deal with 2 values (I might be wrong).
Thank you.
I usually use LOG() when normalizing data that has a large range. Also, to reiterate #Abelisto, your attempt to weight each column doesn't work in your implementation. Adding the 2 together should work.
for example:
...
ORDER BY LOG(followers_count) * 0.3 + LOG(similar_follow) * 0.7 DESC
What about multiplying by exponents (ie. similar_follow ^ 3.0 and followers_count ^ 1.5)?
Reference: https://www.postgresql.org/docs/9.1/static/functions-math.html
Thanks #ForRealHomie, I implemented a query that works. I'm still opened for other
suggestions :)
SELECT
users.id, users.name,
fSimilar.count + fPopular.count as followCount
FROM users
LEFT JOIN usernames ON usernames.user_id=users.id
LEFT JOIN (
SELECT LOG(COUNT(user_id) + 1) * 0.7 as count, user_id
FROM follows
WHERE username_id IN (SELECT username_id FROM follows WHERE user_id=1)
GROUP BY user_id
) fSimilar ON fSimilar.user_id = users.id
LEFT JOIN (SELECT LOG(COUNT(username_id) + 1) * 0.3 as count, username_id
FROM follows
GROUP BY username_id
) fPopular ON fPopular.username_id = usernames.id
WHERE users.id IN (2, 3 ,4)
ORDER BY followCount DESC
NB: LOG(COUNT(...) + 1), + 1 is needed in order to accept 0 values that are generated by COUNT, because LOG doesn't accept 0 so + 1 fixes the issue :)

Trouble in calculating the field while creating view in postgresql

I have two tables q1data and q1lookup in postgres database. q1data contains 3 columns (postid, reasonid, other) and q1lookup contains 2 columns (reasonid, reason).
I am trying to create a view which will include 4 columns (reasonid, reason, count, percentage). count is the count of each reason and percentage should be each count divided by total of count(*) from q1data (i.e. total rows if reasonid).
But it gives an error and says syntax error near count(*). The following is the code I am using. Please help.
select
cwfis_web.q1data.reasonid AS reasonid,
cwfis_web.q1lookup.reason AS reason,
count(cwfis_web.q1data.reasonid) AS count,
round(
(
(
count(cwfis_web.q1data.reasonid)
/
(select count(0) AS count(*) from cwfis_web.q1data)
) * 100
)
,0) AS percentage
from
cwfis_web.q1data
join
cwfis_web.q1lookup
ON cwfis_web.q1data.reasonid = cwfis_web.q1lookup.reasonid
group by
cwfis_web.q1data.reasonid;
Firstly, you have a completely invalid piece of syntax there: count(0) AS count(*). Replacing that with a plain count(*), and adding the missing Group By entry for reason, gives this:
select
cwfis_web.q1data.reasonid AS reasonid,
cwfis_web.q1lookup.reason AS reason,
count(cwfis_web.q1data.reasonid) AS count,
round(
(
(
count(cwfis_web.q1data.reasonid)
/
(select count(*) from cwfis_web.q1data)
) * 100
)
,0) AS percentage
from
cwfis_web.q1data
join
cwfis_web.q1lookup
ON cwfis_web.q1data.reasonid = cwfis_web.q1lookup.reasonid
group by
cwfis_web.q1data.reasonid,
cwfis_web.q1lookup.reason;
However, as this live demo shows this doesn't give the right value for percentage, because count(cwfis_web.q1data.reasonid) and (select count(*) from cwfis_web.q1data) are both of type integer, so integer division is performed, and the result truncated to 0.
If you cast these to numeric (the expected argument type of the 2-parameter round() function, you get this:
select
cwfis_web.q1data.reasonid AS reasonid,
cwfis_web.q1lookup.reason AS reason,
count(cwfis_web.q1data.reasonid) AS count,
round(
(
(
count(cwfis_web.q1data.reasonid)::numeric
/
(select count(*) from cwfis_web.q1data)::numeric
) * 100
)
,0) AS percentage
from
cwfis_web.q1data
join
cwfis_web.q1lookup
ON cwfis_web.q1data.reasonid = cwfis_web.q1lookup.reasonid
group by
cwfis_web.q1data.reasonid,
cwfis_web.q1lookup.reason;
Which as this live demo shows gives something more like you were hoping for. (Alternatively, you can cast to float, and lose the ,0 argument to round(), as in this demo.)
Try changing your subquery from
select count(0) AS count(*) from cwfis_web.q1data
to
select count(0) from cwfis_web.q1data
Also you need to add cwfis_web.q1lookup.reason to group by.

T-SQL if value exists use it other wise use the value before

I have the following table
-----Account#----Period-----Balance
12345---------200901-----$11554
12345---------200902-----$4353
12345 --------201004-----$34
12345 --------201005-----$44
12345---------201006-----$1454
45677---------200901-----$14454
45677---------200902-----$1478
45677 --------201004-----$116776
45677 --------201005-----$996
56789---------201006-----$1567
56789---------200901-----$7894
56789---------200902-----$123
56789 --------201003-----$543345
56789 --------201005-----$114
56789---------201006-----$54
I want to select the account# that have a period of 201005.
This is fairly easy using the code below. The problem is that if a user enters 201003-which doesnt exist- I want the query to select the previous value.*NOTE that there is an account# that has a 201003 period and I still want to select it too.*
I tried CASE, IF ELSE, IN but I was unsuccessfull.
PS:I cannot create temp tables due to system limitations of 5000 rows.
Thank you.
DECLARE #INPUTPERIOD INT
#INPUTPERIOD ='201005'
SELECT ACCOUNT#, PERIOD , BALANCE
FROM TABLE1
WHERE PERIOD =#INPUTPERIOD
SELECT t.ACCOUNT#, t.PERIOD, t.BALANCE
FROM (SELECT ACCOUNT#, MAX(PERIOD) AS MaxPeriod
FROM TABLE1
WHERE PERIOD <= #INPUTPERIOD
GROUP BY ACCOUNT#) q
INNER JOIN TABLE1 t
ON q.ACCOUNT# = t.ACCOUNT#
AND q.MaxPeriod = t.PERIOD
select top 1 account#, period, balance
from table1
where period >= #inputperiod
; WITH Base AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Period DESC) RN FROM #MyTable WHERE Period <= 201003
)
SELECT * FROM Base WHERE RN = 1
Using CTE and ROW_NUMBER() (we take all the rows with Period <= the selected date and we take the top one (the one with auto-generated ROW_NUMBER() = 1)
; WITH Base AS
(
SELECT *, 1 AS RN FROM #MyTable WHERE Period = 201003
)
, Alternative AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Period DESC) RN FROM #MyTable WHERE NOT EXISTS(SELECT 1 FROM Base) AND Period < 201003
)
, Final AS
(
SELECT * FROM Base
UNION ALL
SELECT * FROM Alternative WHERE RN = 1
)
SELECT * FROM Final
This one is a lot more complex but does nearly the same thing. It is more "imperative like". It first tries to find a row with the exact Period, and if it doesn't exists does the same thing as before. At the end it unite the two result sets (one of the two is always empty). I would always use the first one, unless profiling showed me the SQL wasn't able to comprehend what I'm trying to do. Then I would try the second one.